CN109447402A - Choosing method, device and the electronic equipment of contribution gene - Google Patents

Choosing method, device and the electronic equipment of contribution gene Download PDF

Info

Publication number
CN109447402A
CN109447402A CN201811095816.1A CN201811095816A CN109447402A CN 109447402 A CN109447402 A CN 109447402A CN 201811095816 A CN201811095816 A CN 201811095816A CN 109447402 A CN109447402 A CN 109447402A
Authority
CN
China
Prior art keywords
contribution
genome
gene
value
successful match
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811095816.1A
Other languages
Chinese (zh)
Other versions
CN109447402B (en
Inventor
张芃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Language Network (wuhan) Information Technology Co Ltd
Original Assignee
Language Network (wuhan) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Language Network (wuhan) Information Technology Co Ltd filed Critical Language Network (wuhan) Information Technology Co Ltd
Priority to CN201811095816.1A priority Critical patent/CN109447402B/en
Publication of CN109447402A publication Critical patent/CN109447402A/en
Application granted granted Critical
Publication of CN109447402B publication Critical patent/CN109447402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • G06Q10/063112Skill-based matching of a person or a group to a task
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory

Abstract

The embodiment of the present invention provides choosing method, device and the electronic equipment of a kind of contribution gene, this method comprises: choosing the different gene of multiple groups respectively from alternative contribution list of genes, constituting multiple contribution genomes;For each contribution genome, multiple matching result sampling is carried out, obtains multiple successful match rate samples, and calculates the mean value and standard deviation of the corresponding successful match rate of the contribution genome accordingly;Based on the corresponding mean value of all contribution genomes standard deviation corresponding with each contribution genome, the corresponding Z value of the contribution genome is calculated;Based on the corresponding Z value of each contribution genome, the contribution genome for meeting and imposing a condition is chosen, and is merged the gene in the contribution genome to impose a condition is met, the contribution gene finally chosen is obtained.The embodiment of the present invention can choose more effective contribution genome to match with the interpreter being more suitable for, to effectively improve translation efficiency and translation accuracy rate.

Description

Choosing method, device and the electronic equipment of contribution gene
Technical field
The present embodiments relate to technical field of data processing, more particularly, to a kind of contribution gene choosing method, Device and electronic equipment.
Background technique
It include many kinds of and complicated document in internet high speed, the data of magnanimity.Different documents, institute The key message for including is not quite similar, then can carry out the processing suitable for the document to different documents according to these key messages Mode.For example, in translation industry, for different contributions to be translated, the key message that can included according to it, come for should Contribution matches most suitable translator, to effectively improve translation efficiency and translation accuracy.
Contribution is matched with the gene of interpreter to be referred to interpreter's gene and contribution gene under set strategy through Matching Model, It is embodied as the process that contribution finds best interpreter.Selected is used to carry out the matched contribution gene of gene and other contribution genes It compares, it should can preferably embody the otherness of contribution to be matched, so ability contribution to be translated, which is matched to, is more suitable for Interpreter.
Contribution gene refers mainly to be formed as contribution essence by extracting several features to contribution and carrying out efficient combination and portray Relatively unique characterization.It is also assumed that be by contribution characteristic attribute carry out analytical calculation, quantification treatment, it is accessed Be present in contribution, be different from other contributions, unique key message combination.
Contribution gene source is varied.Contribution gene is present in all contributions, and different contributions have different genes. Due to the difference of concrete application, presently, there are document gene matching algorithm carry out matching primitives in selection contribution gene to be matched When, often rule of thumb select the corresponding assortment of genes.
But internet high speed, magnanimity data in contribution many kinds of and intricate, the choosing of above-mentioned contribution gene The mode of selecting can have some limitations, and cause the contribution gene selected that cannot embody the otherness between contribution well.Cause This is when carrying out the selection of contribution gene, it is often more important that extract the gene of otherness, can just be treated in this way with differentiation.
Summary of the invention
In order to overcome the above problem or at least be partially solved the above problem, the embodiment of the present invention provides a kind of contribution base Choosing method, device and the electronic equipment of cause, with so that the contribution gene selected can preferably embody the difference between contribution It is anisotropic.
In a first aspect, the embodiment of the present invention provides a kind of choosing method of contribution gene, comprising: arranged from alternative contribution gene In table, the different gene of multiple groups is chosen respectively, constitutes multiple contribution genomes;For contribution genome described in each, carry out Multiple matching result sampling, obtains multiple successful match rate samples, and be based on the multiple successful match rate sample, calculates the original text The mean value and standard deviation of the corresponding successful match rate of part genome;It is corresponding described equal based on all contribution genomes It is worth the standard deviation corresponding with contribution genome described in each, calculates the corresponding Z value of the contribution genome;Based on each The corresponding Z value of the contribution genome, chooses the contribution gene for meeting and imposing a condition from all contribution genomes Group, and the gene in the contribution genome for meeting and imposing a condition is merged, obtain the contribution gene finally chosen;Wherein, The Z value indicates Z value in the verifying of large sample otherness.
Second aspect, the embodiment of the present invention provide a kind of selecting device of contribution gene, comprising: initial gene chooses mould Block, for choosing the different gene of multiple groups respectively, constituting multiple contribution genomes from alternative contribution list of genes;First meter Module is calculated, for multiple matching result sampling being carried out, obtaining multiple successful match rate samples for contribution genome described in each This, and it is based on the multiple successful match rate sample, calculate the mean value and standard of the corresponding successful match rate of the contribution genome Difference;Second computing module, for based on all corresponding mean values of contribution genome and each described contribution The corresponding standard deviation of genome calculates the corresponding Z value of the contribution genome;Final gene chooses module, for based on every The corresponding Z value of one contribution genome, chooses the contribution for meeting and imposing a condition from all contribution genomes Genome, and the gene in the contribution genome for meeting and imposing a condition is merged, obtain the contribution gene finally chosen;Its In, the Z value indicates Z value in the verifying of large sample otherness.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising: at least one processor, at least one Manage device, communication interface and bus;The memory, the processor and the communication interface are completed mutual by the bus Communication, the communication interface between the electronic equipment and manuscript information equipment information transmission;In the memory It is stored with the computer program that can be run on the processor, when the processor executes the computer program, is realized such as The choosing method of contribution gene described in upper first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, the non-transient calculating Machine readable storage medium storing program for executing stores computer instruction, and the computer instruction executes the computer described in first aspect as above The choosing method of contribution gene.
Choosing method, device and the electronic equipment of contribution gene provided in an embodiment of the present invention, by advance from all original texts Multiple groups contribution genome is chosen in the contribution gene pool of part, and by calculating Z value corresponding to these contribution genomes, to choose Z value meets the contribution genome to impose a condition, to choose as final as a result, enabling the contribution gene selected more preferable Embody contribution between otherness.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of the choosing method of contribution gene provided in an embodiment of the present invention;
Fig. 2 is to be illustrated according to the process for extracting contribution gene in the choosing method of contribution gene provided in an embodiment of the present invention Figure;
Fig. 3 is according to the flow diagram for calculating Z value in the choosing method of contribution gene provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of the selecting device of contribution gene provided in an embodiment of the present invention;
Fig. 5 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment in the embodiment of the present invention, ability Domain those of ordinary skill every other embodiment obtained without making creative work, belongs to the present invention The range of embodiment protection.
It include many kinds of and complicated document in internet high speed, the data of magnanimity.Different documents, institute The key message for including is not quite similar.Due to the difference of concrete application, presently, there are document gene matching algorithm in selection original text When part gene to be matched carries out matching primitives, the corresponding assortment of genes is often rule of thumb selected.But conventional method has certain Limitation, cause the contribution gene selected that cannot embody the problem of otherness of contribution etc. well.
In view of the above-mentioned problems, the embodiment of the present invention from the contribution gene pool of all contributions by choosing multiple groups contribution in advance Genome, and by calculating Z value corresponding to these contribution genomes, meet the contribution genome to impose a condition to choose Z value, To choose as final as a result, the contribution gene selected is enabled preferably to embody the otherness between contribution.Wherein, Z Value indicates Z value in the verifying of large sample otherness.
As the one aspect of the embodiment of the present invention, the present embodiment provides a kind of choosing methods of contribution gene, with reference to figure 1, it is the flow diagram of the choosing method of contribution gene provided in an embodiment of the present invention, comprising:
S101 chooses the different gene of multiple groups respectively, constitutes multiple contribution genomes from alternative contribution list of genes.
It is to be understood that before the contribution gene for carrying out the present embodiment is chosen, it in advance can be according to all properties of contribution Information establishes an alternative contribution list of genes, may include in the alternative contribution list of genes relevant to contribution particular community All genes.Specifically, the alternative contribution list of genes may be considered a gene pool, with gene for singly in the gene pool Position storage has the gene relevant to manuscript information extracted from all contributions, i.e. contribution gene.Contribution gene refers mainly to pass through Analytical calculation, quantification treatment carried out to contribution characteristic attribute, it is accessed be present in contribution, it is being different from other contributions, Unique key message combination.
According to the alternative contribution list of genes in this step, multiple groups contribution gene is chosen respectively, and solicit contributions respectively with each Part gene constitutes a genome, and as contribution genome, which is the contribution genome selected.It is understood that , can be from multiple contributions in alternative contribution list of genes in random selection table when carrying out the selection of each group contribution gene Gene, then the contribution gene randomly selected using these may be constructed a genome, as contribution genome.
It is of course also possible to which predefined decimation rule, e.g., while extracting or successively extracting, interlacing is extracted or specified line number It extracts, is extracted according to the different manuscript informations of gene characterization, quantity of extraction, etc..Carrying out actual extraction process later When, the extraction for each group of contribution gene extracts phase from alternative contribution list of genes according to the decimation rule predetermined The multiple genes answered.
For example, randomly selecting 3-5 different genes from alternative contribution list of genes, as one group of gene, one is constituted A contribution genome.It then adopts in a like fashion, can choose or successively choose simultaneously multiple groups gene respectively, constitute multiple Contribution genome, the embodiment of the present invention to this with no restriction.
S102 carries out multiple matching result sampling for each contribution genome, obtains multiple successful match rate samples This, and multiple successful match rate samples are based on, calculate the mean value and standard deviation of the corresponding successful match rate of the contribution genome.
It is to be understood that for the contribution genome selected for each group, it is thus necessary to determine that the matching of itself and interpreter are imitated Fruit, so that selection is more suitable for the matched contribution gene of gene.Meanwhile in order to without loss of generality, for each group of contribution genome, The contribution genome can be inputted to given Matching Model, carry out multiple matching result sampling using given Matching Model, every time Sampling can obtain a successful match rate sample.
It is understood that carrying out successful match rate sample using Matching Model for each group of contribution genome When acquisition, the gene in this group of contribution genome is input in Matching Model, which can translate according to what itself was provided Member's gene calculates the successful match rate score of gene in the contribution genome and interpreter's gene automatically and exports, then matches mould The successful match rate score of type output can be used as a successful match rate sample.For same contribution genome, carry out more Secondary above-mentioned matching result sampling process, then available multiple successful match rate samples.
Later, it for each contribution selected genome, is obtained according to above-mentioned multiple matching result sampling Multiple successful match rate samples calculate the comprehensive matching success rate of the contribution genome, that is, calculate separately the contribution genome pair The mean value and standard deviation for the successful match rate answered.It is understood that each successful match rate sample, actually primary The successful match rate score sampled with result.
For example, it is assumed that carrying out matching result sampling according to some contribution genome, n successful match rate sample difference is obtained For p1,p2,...pn.The mean value of the corresponding successful match rate of the contribution genome is then calculated according to it are as follows:
In formula, E (p) indicates the mean value of the corresponding successful match rate of contribution genome, piIndicate i-th of contribution genome Successful match rate sample, n indicate the total number of the successful match rate sample for contribution genome acquisition.
On this basis, the standard deviation for calculating the corresponding successful match rate of the contribution genome is as follows:
In formula, S indicates the standard deviation of the corresponding successful match rate of contribution genome, and E (p) indicates that contribution genome is corresponding The mean value of successful match rate, piIndicate that i-th of successful match rate sample of contribution genome, n indicate to be directed to the contribution genome The total number of the successful match rate sample of acquisition.
S103 is based on the corresponding mean value of all contribution genomes standard deviation corresponding with each contribution genome, Calculate the corresponding Z value of the contribution genome;Wherein, the Z value indicates Z value in the verifying of large sample otherness.
It is to be understood that each corresponding successful match of contribution genome selected is calculated according to above-mentioned steps On the basis of the mean value and standard deviation of rate, for the contribution genome that each is selected, its Z value is calculated.Specifically, for Each contribution genome, the standard deviation of the successful match rate according to corresponding to it and all contribution genomes respectively correspond Mean value with success rate calculates its corresponding Z value.
It is understood that the concept of Z value therein is the verifying of large sample otherness, the i.e. concept of Z value in Z verifying.Z Inspection is the method for being generally used for large sample (i.e. sample size is greater than 30) mean difference and examining.It is with standard normal point The theory of cloth come infer difference occur probability, so that whether the difference for comparing two average significant.When known standard deviation, Whether the mean value for verifying one group of number is equal with a certain desired value.The original text selected is measured in the embodiment of the present invention using Z verifying The matching difference of part genome is verified, therefore carries out the calculating of Z value to the contribution genome that each is selected.
S104 is based on the corresponding Z value of each contribution genome, chooses from all contribution genomes and meets setting condition Contribution genome, and the gene in the contribution genome to impose a condition will be met and merged, and obtain the contribution gene finally chosen.
It is to be understood that the Z value of each contribution genome can be calculated according to above-mentioned steps, can be sentenced according to the Z value Break otherness performance of each corresponding contribution genome when carrying out gene matching.Therefore, according to the corresponding Z of each contribution genome Value, can use preset setting condition, the otherness for judging whether the corresponding interpreter's genome of the Z value meets setting is wanted It asks.If conditions are not met, then it is rejected from each contribution genome selected, all contributions that final residue is not removed Genome is satisfactory contribution genome.Gene in remaining all contribution genomes is taken out, and is removing this After duplicate factor in a little genes, one group of new gene is formed, i.e., as the contribution gene finally chosen.
For example, it is assumed that acquiring n successful match rate sample, these successful match rates in total for some contribution genome Sample meets normal distribution.Meanwhile it presetting and having selected the setting condition of contribution gene for the confidence level for the gene selected is not Lower than 95%, the Z value which corresponds to contribution genome is 1.96.Then, for each the contribution genome selected, Its corresponding Z value is compared with 1.96, if Z value is greater than 1.96, the corresponding contribution genome of the Z value is rejected, otherwise, Retain the corresponding contribution genome of the Z value.
Assuming that eliminating p according to above-mentioned treatment process from a contribution genome selected of all n and being unsatisfactory for setting The contribution genome of condition, remaining n-p contribution genome are to meet to impose a condition.Then, in this n-p contribution genome In, may there are two or more than two contribution genomes in contain some contribution gene simultaneously.Therefore by this n-p original text Whole contribution genes in part genome take out, and are put into a new gene pool, multiple for occurring in the gene pool Each contribution gene, rejects extra and only retains the contribution gene.It is included in this final new gene pool Multiple non-repetitive contribution genes, using these genes as the contribution gene finally chosen.
The choosing method of contribution gene provided in an embodiment of the present invention, by advance from the contribution gene pool of all contributions Multiple groups contribution genome is chosen, and by calculating Z value corresponding to these contribution genomes, meets setting condition to choose Z value Contribution genome, as between final choose as a result, the contribution gene selected is enabled preferably to embody contribution Otherness.In addition, the contribution chosen accordingly can be made more reasonably to be matched with existing interpreter in gene matching application, To effectively improve translation efficiency and translation accuracy rate.
Wherein, in one embodiment, from alternative contribution list of genes, the step of the different gene of multiple groups is chosen respectively Before rapid, the method for the embodiment of the present invention further include:
Corresponding base is extracted from all items relevant information of contribution, contribution relevant information and process-related information respectively Cause, and it is correspondingly formed the project related gene, contribution related gene and process related gene of contribution;
Based on project related gene, contribution related gene and process related gene, alternative contribution list of genes is constituted.
It is to be understood that including many kinds of and complicated document in internet high speed, the data of magnanimity.It is different Document, the key message for being included is not quite similar.By the sources of contribution gene, the present embodiment is from following side Contribution gene is extracted in face, constitutes alternative contribution list of genes:
The requirement of item related information, i.e. client to project, including provided related tool, term, expert's support etc. Information belongs to the important sources channel of gene;
The document information of contribution relevant information, contribution itself is determined by document content, including document size, languages letter Breath, classification information, type information, lexical information, term information, syntactic information, semantic information etc.;
Process-related information, refer to contribution from be generated to translation complete etc. during state in which and fragment contribution, The new gene information occurred after being split such as the big contribution in project, such as the variation of number of words, the requirement of quality, the variation of industry, The requirement etc. of time.
Above- mentioned information based on contribution extract the corresponding corresponding gene of contribution respectively, and according to above-mentioned various aspects, formation pair Project related gene, contribution related gene and the process related gene answered.Later, the gene based on above-mentioned various aspects is constituted standby Select contribution list of genes.For example, the corresponding alternative contribution of contribution relevant information can be constructed for the contribution relevant information of contribution List of genes is as shown in table 1, for according to a kind of alternative contribution list of genes of contribution relevant information of the embodiment of the present invention.
Table 1, for according to a kind of alternative contribution list of genes of contribution relevant information of the embodiment of the present invention
Then, when carrying out the selection of multiple contribution genomes according to table 1, multiple points in each data item can be randomly selected Not corresponding contribution gene, such as selection arrive " source languages " corresponding gene " simplified form of Chinese Character " and " fields " corresponding gene " engine " is then constituted a contribution genome with the two.Using same treatment process, can also choose other multiple and different Contribution genome.
If can choose likewise, decimation rule has been previously set to choose gene relevant to contribution document itself The corresponding gene such as " contribution number of words ", " contribution type ", " contribution format " and " referring to corpus ", constitutes contribution gene in table 1 Group.
It is understood that contribution gene is present in contribution, different contributions have different genes, there is general character but heavier What is wanted is the gene of otherness to be extracted, and can just be treated in this way with differentiation, and best interpreter is matched.
But gene is not feature, simply can not explicitly recognize, extract so needing step.Gene and spy Sign is characterized in taking out a certain concept to characteristic common to object there are essential distinction.Include segment attribute in feature, and belongs to Most basic information --- the gene of object included in property.
Therefore the present embodiment is when carrying out the extraction of contribution gene, first according to the three of the contribution of above-described embodiment aspects Information extracts corresponding characteristic information, as contribution feature.Later, according to different contribution features, the attribute letter of contribution is extracted Breath, i.e. contribution attribute extract the most basic information of contribution respectively again later, constitute contribution direct gene.It is specific as shown in Fig. 2, For according to the flow diagram for extracting contribution gene in the choosing method of contribution gene provided in an embodiment of the present invention.
The choosing method of contribution gene provided in an embodiment of the present invention passes through the item related information from contribution, contribution phase Three aspects of information and process-related information are closed, extract the gene of contribution respectively, and constitute alternative contribution list of genes accordingly, with The selection and matching for carrying out more excellent contribution gene, can more fully consider the specific information of contribution different aspect, for more rationally Carry out gene matching provide reliable basis.
Wherein, according to the above embodiments optionally, multiple matching result sampling is carried out, multiple successful match rate samples are obtained This step of, further comprises:
Matching result sampling multiple for any wheel, executes following process flow:
The initial value of the successful match rate of all contribution genomes is initially set;
A contribution genome is randomly selected from all contribution genomes, and the contribution genome of selection is matched Test, and based on to the contribution genome this match test successful match rate result and history match success rate as a result, more The current successful match rate value of the new contribution genome;
It repeats and randomly selects to the step of update, until the number to the match test of any contribution genome reaches First given threshold stops the match test to the contribution genome, and records the current successful match rate of the contribution genome Value;
To the contribution genome other than the contribution genome for stopping match test, the step randomly selected to record is repeated Suddenly, until reaching the second given threshold to the total degree of the match test of all contribution genomes, then each contribution gene is recorded The current successful match rate value of group, and terminate the multiple matching result sampling of epicycle, into the multiple matching result sampling of next round, directly Reach third given threshold to the total wheel number for executing multiple matching result sampling, the quantity for obtaining each contribution genome is third The successful match rate sample of given threshold.
Specifically, can use given Matching Model, the multiple matching result sampling of more wheels is carried out.Obtain it is multiple matching at When power sample, it can be assumed that have chosen m group contribution genome according to the above embodiments, then it can be to each contribution genome Successful match rate sampled, carrying out more wheels based on the above m genome, repeatedly (general no less than 30 times) matching is tested, often It is as follows to take turns match test process:
Step 1, initializing set, such as Initialize installation are carried out to the value of the successful match rate of each contribution genome It is 0.
Step 2, a contribution genome is randomly choosed, successful match rate result is carried out in given Matching Model and calculates, Obtain the successful match rate result of this match test.Meanwhile in conjunction in epicycle multiple matching result sampling historical record it The successful match rate of preceding match test for several times is as a result, i.e. history match success rate is as a result, calculate the contribution genome chosen Current successful match rate value.
Step 3, repeatedly circulation executes above-mentioned steps 1 and 2, is from all contribution bases due to choosing contribution genome every time all Because randomly selecting in group, therefore each genome may be different by the number of carry out match test, then when to some contribution gene When the number of the match test of group reaches the first given threshold, that is, stop the epicycle match test to the contribution genome, And when recording stopping test, the current successful match rate value of the contribution genome.
Step 4, remaining contribution genome except the contribution genome of the first given threshold is reached for removing, continues to hold The process flow of row above-mentioned steps 1-3 stops epicycle matching until the total degree of epicycle match test reaches the second given threshold Test.At this point for each contribution genome, there is a successful match rate value to be corresponding to it, as the multiple matching result of epicycle M successful match rate sample can be obtained then for m contribution genome by sampling obtained successful match rate sample.
So, for all contribution genomes, the above-mentioned multiple matching of more wheels (such as reaching third given threshold) is carried out As a result sample, it can obtain multiple successful match rate samples of each contribution genome, such as wheel number is set as n, then matching at Power sample number is n (n is typically no less than 50).
For example, it is assumed that having selected a1、a2And a3Totally three contribution genomes, and preset the first given threshold, second Given threshold and third given threshold are respectively 3,8 and 5.Then, in the multiple matching result sampling of each round:
First time selection is carried out first, from a1、a2And a3In randomly select one, such as choose and arrive a1, then to a1Progress With test, test result is successful match, then obtains a1Successful match rate value be 100%.
It is chosen followed by second, it is assumed that choose and arrive a2, match test is carried out to it, obtains test result as matching It is unsuccessful, then obtain a2Successful match rate value be 0%.
Next third time selection is carried out again, it is assumed that and choose and arrive a1, and match test result is to match unsuccessful, then root According to a1In total twice match test as a result, obtaining a1Current successful match rate value is 50%.
Next the 4th selection is carried out again, it is assumed that is chosen and is arrived a3, and match test result is successful match, then obtains a3 Successful match rate value be 100%.
Next the 5th selection is carried out again, it is assumed that and choose and arrive a1, and match test result is successful match, then basis To a1In total three times match test as a result, obtaining a1Current successful match rate value is 66.6%.At this point, to a1Matching examination It tests number and has had reached the first given threshold 3, then stop continuing to a1Carry out match test, and export its current matching at Contribution genome a in the multiple matching result sampling of performance number 66.6%, as epicycle1Successful match rate sample.
Next the 6th selection is carried out again, due to a13 match tests are had reached, then only in a2And a3Middle progress Match test is randomly selected and carries out, specific selection and match test process are similar with above-mentioned steps.In this way, until total matching The number of test, i.e., to a1、a2And a3The total degree of match test when reaching the second given threshold 8 times, terminate epicycle multiple It is sampled with result.At this point, having obtained a successful match rate sample all in accordance with above-mentioned match test for each contribution genome This.
So, to three contribution genome a1、a2And a3, repeat more wheels and carry out above-mentioned multiple matching result sampling, then often One wheel can obtain a1、a2And a3Corresponding one group of successful match rate sample.Until duplicate discussion reaches third given threshold 5, then available a1、a2And a3Respectively corresponding 5 successful match rate sample.
The choosing method of contribution gene provided in an embodiment of the present invention carries out each contribution genome using given Matching Model Multiple successful match rate calculate, and accordingly choose the higher contribution genome of successful match rate, calculated result reliability can be made It is higher.
Wherein, optional according to above-described embodiment, based on the corresponding mean value of all contribution genomes and each original text The corresponding standard deviation of part genome, the step that is further processed for calculating the corresponding Z value of the contribution genome refer to Fig. 3, according to The flow diagram of Z value is calculated in the choosing method of contribution gene provided in an embodiment of the present invention, comprising:
S301 is based on the corresponding mean value of all contribution genomes, calculates the successful match rate of all contribution genomes Unified mean value.
It is to be understood that can be calculated respectively for all contribution genomes selected according to above-described embodiment The mean value of the corresponding successful match rate of contribution genome.Then respectively corresponded first according to each contribution genome in the present embodiment Successful match rate mean value, calculate the mean value of successful match rate corresponding to all contribution genome entirety, i.e. successful match The unified mean value of rate.Specifically, can calculate according to the following formula:
In formula, μ indicates that integrally corresponding successful match rate unifies mean value to all contribution genomes, and m indicates the institute selected There are the group number of contribution genome, Ei(p) mean value of the corresponding successful match rate of i-th of contribution genome is indicated.
S302, it is corresponding based on each corresponding standard deviation of contribution genome and mean value and all contribution genomes Unified mean value, calculates the corresponding Z value of the contribution genome.
It is to be understood that on the basis of all contribution genomes of above-mentioned steps calculating acquisition corresponding unified mean value, knot It closes above-described embodiment and the standard deviation and mean value of the corresponding successful match rate of each contribution genome is calculated, using given Z value calculation formula can correspond to each contribution genome Z value for calculating and selecting.
Wherein, in one embodiment, the step of calculating the contribution genome corresponding Z value further comprises: using such as Lower formula calculates the corresponding Z value of each contribution genome:
In formula, ZiIndicate the corresponding Z value of i-th of contribution genome, n indicates the corresponding successful match of each contribution genome The number of rate sample, Ei(p) the corresponding mean value of i-th of contribution genome is indicated, μ indicates the corresponding unification of all contribution genomes Mean value, SiIndicate the corresponding standard deviation of i-th of contribution genome.
The choosing method of contribution gene provided in an embodiment of the present invention is distinguished using each the contribution genome selected Corresponding mean value successively calculates the unified mean value of all contribution genomes and the Z value of each contribution genome, can be more accurate Characterize the successful match rate situation of each contribution genome, so as to more accurately choose contribution gene come with interpreter's gene into Row matching, improves matching effect.
Wherein, optional according to above-described embodiment, it is based on the corresponding Z value of each contribution genome, from all contribution bases Further comprise because choosing the step of meeting the contribution genome to impose a condition in group: if multiple successful match rate samples meet just State distribution then determines default Z value according to default confidence level, and rejects the contribution genome that Z value is greater than default Z value, with all institutes State the contribution genome that remaining contribution genome imposes a condition as satisfaction in contribution genome.
It is to be understood that may determine that each corresponding contribution according to the Z value after the Z value for calculating each contribution genome Otherness performance of the genome when carrying out gene matching.Therefore, according to the corresponding Z value of each contribution genome, it can be determined that Its otherness requirement for whether meeting setting, if conditions are not met, then rejecting it from each contribution genome selected, finally All contribution genomes that residue is not removed are satisfactory contribution genome.By remaining all contribution genomes In gene take out, and after removing the duplicate factor in these genes, form one group of new gene, i.e., as finally choosing Contribution gene.
The case where normal distribution is met for the successful match rate sample of sampling, if 95% confidence level is obtained, i.e., Preset selection standard is that confidence level meets 95%, then should be not more than 1.96 for the Z value that contribution genome calculates.Therefore Contribution genome of the Z value greater than 1.96 will be removed, and the contribution genome of remaining primary election will be kept as finally choosing Contribution genome.
The choosing method of contribution gene provided in an embodiment of the present invention, by the judgement to calculated result and to selecting step Circulating repetition execute, can guarantee that the high quality gene met the requirements can be selected, for more accurately matching interpreter tool It is significant.
Further, on the basis of the above embodiments, choosing what satisfaction imposed a condition from all contribution genomes After the step of contribution genome, the method for the embodiment of the present invention further include: if in all contribution genomes, Z value is no more than pre- If the quantity of the contribution genome of Z value is less than preset threshold, then multiple groups gene is selected from alternative contribution list of genes again, into The multiple matching result of row is sampled to the selecting step obtained between the contribution gene finally chosen.
It is to be understood that the embodiment of the present invention is obtaining the corresponding Z value of each contribution genome, and from all contribution bases It can also include following processing step after choosing the contribution genome for meeting and imposing a condition in group: little by statistics Z value In the quantity of the contribution genome of default Z value, the quantity for the contribution genome that the satisfaction finally chosen imposes a condition is counted, and will The quantity is compared with pre-set preset threshold, if the quantity is less than given threshold, again from alternative contribution base Because selecting multiple groups gene in list, carries out multiple matching result in above-described embodiment and sample the contribution gene finally chosen to acquisition Between selecting step.
For example, being utilized respectively preset selection standard according to the corresponding Z value of each contribution genome and being sentenced It is disconnected.If in all contribution genomes, the Z value of none contribution genome can satisfy the selection standard, then step is returned to S101 chooses the different contribution genome of multiple groups again from alternative contribution list of genes, re-starts the meter of above-described embodiment It calculates and the process of selection.
For example, the case where successful match rate sample for sampling meets normal distribution, to obtain 95% confidence level, I.e. preset setting condition is that the confidence level of contribution genome meets 95%, then the Z value calculated for contribution genome is answered No more than 1.96.And in practical application, it, may be due to being when choosing multiple groups contribution genome from alternative contribution list of genes It the reasons such as randomly selects, causes when calculating Z value to the contribution genome selected, Z value is not able to satisfy above-mentioned standard, then needs Again other contribution genome is selected in alternative contribution list of genes, and is recalculated and chosen.
The choosing method of contribution gene provided in an embodiment of the present invention, by the judgement to calculated result and to selecting step Circulating repetition execute, can guarantee that the high quality gene met the requirements can be selected, for more accurately matching interpreter tool It is significant.
Further, on the basis of the above embodiments, multiple matching result sampling is being carried out, is obtaining multiple successful match Before the step of rate sample, the method for the embodiment of the present invention further include: according to gene matching precision demand, be set for matching knot The total degree threshold value of fruit sampling;Then correspondingly, for each contribution genome, the number of the successful match rate sample of extraction is not Less than total degree threshold value.
It is to be understood that carrying out multiple matching result sampling, before the step of obtaining multiple successful match rate samples, this Embodiment is set for the total degree threshold of matching result sampling according to the demand of the gene matching primitives precision with interpreter to be matched Value, then accordingly in actual samples, the number of acquisition successful match rate sample is not less than the total degree threshold value.For example, for Each contribution genome, it is desirable that the number of the successful match rate sample of extraction is no less than 50, then the data 50 are as set in advance Fixed total degree threshold value.
The choosing method of contribution gene provided in an embodiment of the present invention, by setting suitable total degree threshold value, Neng Goubao Sample size is demonstrate,proved, to more without loss of generality, have higher precision.
As the other side of the embodiment of the present invention, the embodiment of the present invention provides a kind of contribution according to the above embodiments The selecting device of gene, the device are used to realize the selection to final contribution gene in the above embodiments.Therefore, above-mentioned Description and definition in the choosing method of the contribution gene of each embodiment can be used for each execution module in the embodiment of the present invention Understanding, specifically refer to above-described embodiment, do not repeating herein.
One embodiment of present aspect embodiment according to the present invention, the structure of the selecting device of contribution gene as shown in figure 4, For the structural schematic diagram of the selecting device of contribution gene provided in an embodiment of the present invention, which can be used for above-mentioned each method The selection of contribution gene in embodiment, the device include: that initial gene chooses module 401, the first computing module 402, second meter It calculates module 403 and final gene chooses module 404.
Wherein, initial gene is chosen module 401 and is used for from alternative contribution list of genes, chooses the different base of multiple groups respectively Cause constitutes multiple contribution genomes;First computing module 402 is used to carry out repeatedly matching knot for each contribution genome Fruit sampling obtains multiple successful match rate samples, and is based on multiple successful match rate samples, and it is corresponding to calculate the contribution genome The mean value and standard deviation of successful match rate;Second computing module 403 is used to be based on the corresponding mean value of all contribution genomes Standard deviation corresponding with each contribution genome calculates the corresponding Z value of the contribution genome;Final gene chooses module 404 For being based on the corresponding Z value of each contribution genome, the contribution base for meeting and imposing a condition is chosen from all contribution genomes Merge because of group, and by the gene in the contribution genome to impose a condition is met, obtains the contribution gene finally chosen;Wherein, institute Stating Z value indicates Z value in the verifying of large sample otherness.
Specifically, initial gene chooses module 401 can select respectively according to the alternative contribution list of genes pre-established Multiple groups contribution gene is taken, and a genome is constituted with each group of contribution gene respectively, as contribution genome, the contribution gene Group is the contribution genome selected.For example, initial gene chooses module 401 can when carrying out the selection of each group contribution gene With from multiple contribution genes in alternative contribution list of genes in random selection table, and the contribution gene randomly selected using these Constitute a genome, as contribution genome.
Later, for the contribution genome selected for each group, it is thus necessary to determine that the matching effect of itself and contribution, thus Selection is more suitable for the matched contribution gene of gene.Meanwhile in order to without loss of generality, be counted for each group of contribution genome, first Calculating module 402 can repeatedly be matched by the way that the contribution genome is inputted given Matching Model using given Matching Model As a result it samples, sampling can obtain a successful match rate sample every time.It is understood that each successful match rate sample This, the successful match rate score that an actually matching result samples.
In addition, the first computing module 402 is according to above-mentioned multiple matching for each contribution selected genome As a result the multiple successful match rate samples for sampling acquisition, calculate the comprehensive matching success rate of the contribution genome, that is, calculate separately The mean value and standard deviation of the corresponding successful match rate of the contribution genome.
Later, the contribution genome that the second computing module 403 selects each calculates its Z value.Specifically, For each contribution genome, the standard deviation of the successful match rate according to corresponding to it and all contribution genomes are right respectively The mean value for answering successful match rate calculates its corresponding Z value.
Finally, final gene, which chooses module 404, may determine that each corresponding contribution genome according to the above-mentioned Z value being calculated Otherness performance when carrying out gene matching.Therefore, according to the corresponding Z value of each contribution genome, final gene chooses mould Block 404 can use preset setting condition, judge whether the corresponding interpreter's genome of the Z value meets the otherness of setting It is required that.If conditions are not met, then it is rejected from each contribution genome selected, all original texts that final residue is not removed Part genome is satisfactory contribution genome.Final gene chooses module 404 again by remaining all contribution genomes In gene take out, and after removing the duplicate factor in these genes, form one group of new gene, i.e., as finally choosing Contribution gene.
Further, on the basis of the above embodiments, the device of the embodiment of the present invention further includes alternative contribution gene column Table constructs module, is used for: extracting from all items relevant information of contribution, contribution relevant information and process-related information respectively Corresponding gene, and it is correspondingly formed the project related gene, contribution related gene and process related gene of contribution;Based on project phase Correlation gene, contribution related gene and process related gene constitute alternative contribution list of genes.
Wherein optional, the second computing module is specifically used for: being based on the corresponding mean value of all contribution genomes, calculates The unified mean value of the successful match rate of all contribution genomes;Based on each corresponding standard deviation of contribution genome and mean value, And the corresponding unified mean value of all contribution genomes, calculate the corresponding Z value of the contribution genome.
Wherein optional, the second computing module is specifically used for: utilizing following formula, calculates the corresponding Z of each contribution genome Value:
In formula, ZiIndicate the corresponding Z value of i-th of contribution genome, n indicates the corresponding successful match of each contribution genome The number of rate sample, Ei(p) the corresponding mean value of i-th of contribution genome is indicated, μ indicates the corresponding unification of all contribution genomes Mean value, SiIndicate the corresponding standard deviation of i-th of contribution genome.
Wherein optional, final gene is chosen module and is specifically used for: if multiple successful match rate samples meet normal distribution, Default Z value is then determined according to default confidence level, and rejects the contribution genome that Z value is greater than default Z value, with all contribution bases Because contribution genome remaining in group is as the contribution genome for meeting setting condition.
Further, on the basis of the above embodiments, the device of the embodiment of the present invention further includes judgment module, is used for: If in all contribution genomes, the quantity that Z value is not more than the contribution genome of default Z value is less than preset threshold, then again from standby Selection multiple groups gene in contribution list of genes is selected, multiple matching result is carried out and samples between the contribution gene finally chosen to acquisition Selecting step.
Further, on the basis of the above embodiments, the first computing module is also used to: according to gene matching precision need It asks, is set for the total degree threshold value of matching result sampling;Then correspondingly, for each contribution genome, the matching of extraction The number of success rate sample is no less than total degree threshold value.
It is understood that can be by hardware processor (hardware processor) come real in the embodiment of the present invention Each relative program module in the device of existing the various embodiments described above.Also, the selecting device of each contribution gene of the embodiment of the present invention The beneficial effect of generation is identical as corresponding above-mentioned each method embodiment, can refer to above-mentioned each method embodiment, herein no longer It repeats.
As the another aspect of the embodiment of the present invention, the present embodiment provides a kind of electronics according to the above embodiments and sets It is standby, it is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention, comprising: at least one processor with reference to Fig. 5 501, at least one processor 502, communication interface 503 and bus 504.
Wherein, memory 501, processor 502 and communication interface 503 complete mutual communication by bus 504, communicate Interface 503 is for the information transmission between the electronic equipment and manuscript information equipment;Being stored in memory 501 can be in processor The computer program run on 502 when processor 502 executes the computer program, realizes the original text as described in the various embodiments described above The choosing method of part gene.
It is to be understood that including at least memory 501, processor 502, communication interface 503 and bus in the electronic equipment 504, and memory 501, processor 502 and communication interface 503 form mutual communication connection by bus 504, and can be complete The program instruction of the choosing method of contribution gene is read from memory 501 at mutual communication, such as processor 502.Separately Outside, communication interface 503 can also realize the communication connection between the electronic equipment and manuscript information equipment, and achievable mutual Information transmission, such as the selection to contribution gene is realized by communication interface 503.
When electronic equipment is run, processor 502 calls the program instruction in memory 501, real to execute above-mentioned each method Apply method provided by example, for example, from alternative contribution list of genes, choose the different gene of multiple groups respectively, constitute more A contribution genome;For each contribution genome, multiple matching result sampling is carried out, obtains multiple successful match rate samples This, and multiple successful match rate samples are based on, calculate the mean value and standard deviation of the corresponding successful match rate of the contribution genome;Base In the corresponding mean value of all contribution genomes standard deviation corresponding with each contribution genome, the contribution genome is calculated Corresponding Z value;Based on the corresponding Z value of each contribution genome, is chosen from all contribution genomes and meet setting condition Contribution genome, and merge the gene in the contribution genome to impose a condition is met, obtain the contribution gene finally chosen;Its In, the Z value indicates Z value etc. in the verifying of large sample otherness.
Program instruction in above-mentioned memory 501 can be realized and as independent by way of SFU software functional unit Product when selling or using, can store in a computer readable storage medium.Alternatively, realizing that above-mentioned each method is implemented This can be accomplished by hardware associated with program instructions for all or part of the steps of example, and program above-mentioned can store to be calculated in one In machine read/write memory medium, when being executed, execution includes the steps that above-mentioned each method embodiment to the program;And storage above-mentioned Medium includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), the various media that can store program code such as magnetic or disk.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium also according to the various embodiments described above, this is non-temporarily State computer-readable recording medium storage computer instruction, the computer instruction make computer execute the contribution such as above-described embodiment The choosing method of gene, for example, from alternative contribution list of genes, choose the different gene of multiple groups respectively, constitute multiple Contribution genome;For each contribution genome, multiple matching result sampling is carried out, obtains multiple successful match rate samples, And multiple successful match rate samples are based on, calculate the mean value and standard deviation of the corresponding successful match rate of the contribution genome;It is based on The corresponding mean value of all contribution genomes standard deviation corresponding with each contribution genome, calculates the contribution genome pair The Z value answered;Based on the corresponding Z value of each contribution genome, the original text for meeting and imposing a condition is chosen from all contribution genomes Part genome, and merge the gene in the contribution genome to impose a condition is met, obtain the contribution gene finally chosen;Its In, the Z value indicates Z value etc. in the verifying of large sample otherness.
Electronic equipment provided in an embodiment of the present invention and non-transient computer readable storage medium, by executing above-mentioned each reality The choosing method of contribution gene described in example is applied, chooses multiple groups contribution genome from the contribution gene pool of all contributions in advance, And by calculate these contribution genomes corresponding to Z value, come choose Z value meet impose a condition contribution genome, using as Final selection is as a result, enable the contribution gene selected preferably to embody the otherness between contribution, and be further able to Match the contribution chosen accordingly more reasonably with existing contribution, to effectively improve translation efficiency and translation accuracy rate.
It is understood that the embodiment of device described above, electronic equipment and storage medium is only schematic , wherein unit may or may not be physically separated as illustrated by the separation member, it can both be located at one Place, or may be distributed on heterogeneous networks unit.Some or all of modules can be selected according to actual needs To achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are without paying creative labor To understand and implement.
By the description of embodiment of above, those skilled in the art is it will be clearly understood that each embodiment can borrow Help software that the mode of required general hardware platform is added to realize, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned Substantially the part that contributes to existing technology can be embodied in the form of software products technical solution in other words, the meter Calculation machine software product may be stored in a computer readable storage medium, such as USB flash disk, mobile hard disk, ROM, RAM, magnetic disk or light Disk etc., including some instructions, with so that a computer equipment (such as personal computer, server or network equipment etc.) Execute method described in certain parts of above-mentioned each method embodiment or embodiment of the method.
In addition, those skilled in the art are it should be understood that in the application documents of the embodiment of the present invention, term "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, so that including a series of elements Process, method, article or equipment not only include those elements, but also including other elements that are not explicitly listed, or Person is to further include for elements inherent to such a process, method, article, or device.In the absence of more restrictions, by The element that sentence "including a ..." limits, it is not excluded that in the process, method, article or apparatus that includes the element There is also other identical elements.
In the specification of the embodiment of the present invention, numerous specific details are set forth.It should be understood, however, that the present invention is implemented The embodiment of example can be practiced without these specific details.In some instances, it is not been shown in detail well known Methods, structures and technologies, so as not to obscure the understanding of this specification.Similarly, it should be understood that in order to simplify implementation of the present invention Example is open and helps to understand one or more of the various inventive aspects, above to the exemplary embodiment of the embodiment of the present invention Description in, each feature of the embodiment of the present invention is grouped together into single embodiment, figure or descriptions thereof sometimes In.
However, the disclosed method should not be interpreted as reflecting the following intention: i.e. the claimed invention is implemented Example requires features more more than feature expressly recited in each claim.More precisely, such as claims institute As reflection, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific embodiment party Thus claims of formula are expressly incorporated in the specific embodiment, wherein each claim itself is real as the present invention Apply the separate embodiments of example.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the embodiment of the present invention, rather than it is limited System;Although the embodiment of the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art it is understood that It is still possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is carried out etc. With replacement;And these are modified or replaceed, each embodiment skill of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution The spirit and scope of art scheme.

Claims (10)

1. a kind of choosing method of contribution gene characterized by comprising
From alternative contribution list of genes, the different gene of multiple groups is chosen respectively, constitutes multiple contribution genomes;
For contribution genome described in each, multiple matching result sampling is carried out, obtains multiple successful match rate samples, and base In the multiple successful match rate sample, the mean value and standard deviation of the corresponding successful match rate of the contribution genome are calculated;
It is corresponding with contribution genome described in each described based on the corresponding mean value of all contribution genomes Standard deviation calculates the corresponding Z value of the contribution genome;
Based on the corresponding Z value of contribution genome described in each, is chosen from all contribution genomes and meet setting The contribution genome of condition, and the gene in the contribution genome for meeting and imposing a condition is merged, what acquisition was finally chosen Contribution gene;
Wherein, the Z value indicates Z value in the verifying of large sample otherness.
2. the method according to claim 1, wherein being chosen respectively from alternative contribution list of genes described Before the step of multiple groups different gene, further includes:
Corresponding gene is extracted from all items relevant information of contribution, contribution relevant information and process-related information respectively, And it is correspondingly formed the project related gene, contribution related gene and process related gene of contribution;
Based on the project related gene, contribution related gene and process related gene, the alternative contribution list of genes is constituted.
3. the method according to claim 1, wherein described corresponding based on all contribution genomes The mean value standard deviation corresponding with contribution genome described in each, calculates the step of the corresponding Z value of the contribution genome Suddenly further comprise:
Based on the corresponding mean value of all contribution genomes, the successful match of all contribution genomes is calculated The unified mean value of rate;
Based on the corresponding standard deviation of contribution genome described in each and the mean value and all contribution genomes The corresponding unified mean value, calculates the corresponding Z value of the contribution genome.
4. according to the method described in claim 3, it is characterized in that, the contribution genome corresponding Z value of calculating Step further comprises:
Using following formula, the corresponding Z value of each contribution genome is calculated:
In formula, ZiIndicate the corresponding Z value of i-th of contribution genome, n indicates the corresponding matching of each contribution genome The number of success rate sample, Ei(p) the corresponding mean value of i-th of contribution genome is indicated, μ indicates all contribution genes The corresponding unified mean value of group, SiIndicate the corresponding standard deviation of i-th of contribution genome.
5. the method according to claim 1, wherein described based on the corresponding institute of each described contribution genome The step of stating Z value, the contribution genome for meeting setting condition is chosen from all contribution genomes further comprises:
Meet normal distribution by the multiple successful match rate sample, then default Z value is determined according to default confidence level, and reject Z Value is greater than the contribution genome of the default Z value, using remaining contribution genome in all contribution genomes described in Meet the contribution genome to impose a condition.
6. according to the method described in claim 5, it is characterized in that, meeting in described chosen from all contribution genomes After the step of contribution genome of setting condition, further includes:
If in all contribution genomes, Z value is less than default threshold no more than the quantity of the contribution genome of the default Z value Value then selects multiple groups gene from the alternative contribution list of genes again, carries out the multiple matching result and samples to described Obtain the selecting step between the contribution gene finally chosen.
7. being obtained multiple the method according to claim 1, wherein carrying out multiple matching result sampling described Before the step of successful match rate sample, further includes:
According to gene matching precision demand, it is set for the total degree threshold value of matching result sampling;
Then correspondingly, for contribution genome described in each, the number of the successful match rate sample of extraction is no less than institute State total degree threshold value.
8. a kind of selecting device of contribution gene characterized by comprising
Initial gene chooses module, for choosing the different gene of multiple groups respectively, constituting multiple from alternative contribution list of genes Contribution genome;
First computing module obtains multiple for carrying out multiple matching result sampling for contribution genome described in each It is made into power sample, and is based on the multiple successful match rate sample, calculates the corresponding successful match rate of the contribution genome Mean value and standard deviation;
Second computing module, for based on all corresponding mean values of contribution genome and each described contribution The corresponding standard deviation of genome calculates the corresponding Z value of the contribution genome;
Final gene chooses module, for based on the corresponding Z value of each described contribution genome, from all contributions The contribution genome for meeting and imposing a condition, and the gene in the contribution genome that the satisfaction is imposed a condition are chosen in genome Merge, obtains the contribution gene finally chosen;
Wherein, the Z value indicates Z value in the verifying of large sample otherness.
9. a kind of electronic equipment characterized by comprising at least one processor, at least one processor, communication interface and total Line;
The memory, the processor and the communication interface complete mutual communication, the communication by the bus Interface is for the information transmission between the electronic equipment and manuscript information equipment;
The computer program that can be run on the processor is stored in the memory, the processor executes the calculating When machine program, the method as described in any in claim 1 to 7 is realized.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in any in claim 1 to 7.
CN201811095816.1A 2018-09-19 2018-09-19 Manuscript gene selection method and device and electronic equipment Active CN109447402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811095816.1A CN109447402B (en) 2018-09-19 2018-09-19 Manuscript gene selection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811095816.1A CN109447402B (en) 2018-09-19 2018-09-19 Manuscript gene selection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109447402A true CN109447402A (en) 2019-03-08
CN109447402B CN109447402B (en) 2022-02-22

Family

ID=65533071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811095816.1A Active CN109447402B (en) 2018-09-19 2018-09-19 Manuscript gene selection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109447402B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0685801A1 (en) * 1985-11-19 1995-12-06 International Business Machines Corporation Process in an information processing system for compaction and replacement of phrases
CN103678541A (en) * 2013-11-30 2014-03-26 武汉传神信息技术有限公司 Translation competence data association rule mining method
CN105555968A (en) * 2013-05-24 2016-05-04 塞昆纳姆股份有限公司 Methods and processes for non-invasive assessment of genetic variations
CN107103419A (en) * 2017-04-20 2017-08-29 北京航空航天大学 One kind of groups software development process analogue system and method
CN107704469A (en) * 2016-08-08 2018-02-16 中国科学院文献情报中心 The mapping method and device of patent data and industry data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0685801A1 (en) * 1985-11-19 1995-12-06 International Business Machines Corporation Process in an information processing system for compaction and replacement of phrases
CN105555968A (en) * 2013-05-24 2016-05-04 塞昆纳姆股份有限公司 Methods and processes for non-invasive assessment of genetic variations
CN103678541A (en) * 2013-11-30 2014-03-26 武汉传神信息技术有限公司 Translation competence data association rule mining method
CN107704469A (en) * 2016-08-08 2018-02-16 中国科学院文献情报中心 The mapping method and device of patent data and industry data
CN107103419A (en) * 2017-04-20 2017-08-29 北京航空航天大学 One kind of groups software development process analogue system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
管瑞霞: "基于基因表达式编程的中文文本关键词提取算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
韩旭: "翻译+互联网=?翻译平台语翼用"DNA"匹配译员", 《36KR:HTTPS://36KR.COM/P/1721672007681》 *

Also Published As

Publication number Publication date
CN109447402B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
CN107329967B (en) Question answering system and method based on deep learning
CN108829682B (en) Computer readable storage medium, intelligent question answering method and intelligent question answering device
CN106095834A (en) Intelligent dialogue method and system based on topic
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN104503998B (en) For the kind identification method and device of user query sentence
CN103577989B (en) A kind of information classification approach and information classifying system based on product identification
CN105912645B (en) A kind of intelligent answer method and device
CN106649742A (en) Database maintenance method and device
Udagawa et al. A natural language corpus of common grounding under continuous and partially-observable context
CN108804526A (en) Interest determines that system, interest determine method and storage medium
CN109299245A (en) The method and apparatus that knowledge point is recalled
CN107273406A (en) Dialog process method and device in task dialogue system
CN108108347B (en) Dialogue mode analysis system and method
CN111309887B (en) Method and system for training text key content extraction model
CN109857846A (en) The matching process and device of user's question sentence and knowledge point
CN109409504A (en) A kind of data processing method, device, computer and storage medium
CN111144112A (en) Text similarity analysis method and device and storage medium
CN107766560A (en) The evaluation method and system of customer service flow
CN109447402A (en) Choosing method, device and the electronic equipment of contribution gene
CN107122378A (en) Object processing method and device
CN109299738A (en) Choosing method, device and the electronic equipment of contribution gene
CN109299737A (en) Choosing method, device and the electronic equipment of interpreter's gene
CN115048505A (en) Corpus screening method and device, electronic equipment and computer readable medium
CN109448792A (en) Choosing method, device and the electronic equipment of interpreter's gene
CN115080732A (en) Complaint work order processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant