WO2020057003A1 - 译员基因的选取方法、装置与电子设备 - Google Patents

译员基因的选取方法、装置与电子设备 Download PDF

Info

Publication number
WO2020057003A1
WO2020057003A1 PCT/CN2018/124951 CN2018124951W WO2020057003A1 WO 2020057003 A1 WO2020057003 A1 WO 2020057003A1 CN 2018124951 W CN2018124951 W CN 2018124951W WO 2020057003 A1 WO2020057003 A1 WO 2020057003A1
Authority
WO
WIPO (PCT)
Prior art keywords
translator
genome
genomes
genes
matching
Prior art date
Application number
PCT/CN2018/124951
Other languages
English (en)
French (fr)
Inventor
张芃
Original Assignee
语联网(武汉)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 语联网(武汉)信息技术有限公司 filed Critical 语联网(武汉)信息技术有限公司
Publication of WO2020057003A1 publication Critical patent/WO2020057003A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group

Definitions

  • Embodiments of the present disclosure relate to the technical field of data processing, and more particularly, to a method, device, and electronic device for selecting translator genes.
  • the information age and the Internet have greatly changed the way translation works.
  • the translation process management platform is used to store talent data according to different objects, so as to match the most suitable translator according to the manuscript to be translated.
  • Different translators contain different key information. According to these key information, we can match the most suitable translation manuscripts for translators, thereby effectively improving translation efficiency and accuracy.
  • Translator and manuscript gene matching refers to the process of finding the best translator for a manuscript by matching the manuscript genes with the translator genes under a given strategy through a matching model. Compared with other translator genes, the translator genes selected for gene matching should better reflect the differences of translators, so as to match more suitable translators for the manuscripts to be translated.
  • the translator gene mainly refers to the unique key information combination that exists in a particular translator and is different from other translators by analyzing, calculating, and quantifying the characteristic attributes of the translator. There are many sources for the genes of translators. In the social age, all the data of translators' every move can extract genes.
  • the embodiments of the present disclosure provide a method, device and electronic device for selecting translator genes, so that the selected translator genes can better reflect the differences among translators.
  • an embodiment of the present disclosure provides a method for selecting translator genes, including: selecting a plurality of different genes from a candidate translator gene list to form a plurality of translator genomes; for each of the translator genomes, performing Sampling multiple matching results to obtain multiple matching success rate samples, and calculating the mean and standard deviation of the matching success rate corresponding to the translator's genome based on the multiple matching success rate samples; The mean value and the standard deviation corresponding to each of the translator's genomes are used to calculate a Z value corresponding to the translator's genome; based on the Z value corresponding to each of the translator's genomes, a satisfiable setting is selected from all the translator's genomes.
  • an embodiment of the present disclosure provides a device for selecting translator genes, including: an initial gene selection module for selecting a plurality of different genes from a candidate translator gene list to form multiple translator genomes; first A calculation module configured to perform multiple matching result samplings for each of the translator genomes, obtain multiple matching success rate samples, and calculate an average value of the matching success rate corresponding to the translator genome based on the multiple matching success rate samples.
  • a second calculation module for calculating a Z value corresponding to the translator's genome based on the mean corresponding to all the translator's genomes and the standard deviation corresponding to each of the translator's genomes; a final gene selection module For selecting a translator genome that meets a set condition from all the translator genomes based on the Z value corresponding to each of the translator genomes, and combining genes in the translator genome that meets the set conditions to obtain The translator gene finally selected; wherein, the Z value represents the Z value in the verification of large sample differences.
  • an embodiment of the present disclosure provides an electronic device including: at least one memory, at least one processor, a communication interface, and a bus; the memory, the processor, and the communication interface completing each other through the bus. Communication, the communication interface is used for information transmission between the electronic device and the interpreter information device; a computer program executable on the processor is stored in the memory, and the processor executes the computer program , The method for selecting translator genes as described in the first aspect is implemented.
  • an embodiment of the present disclosure provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the instructions described in the first aspect above. Method for selecting translator genes.
  • the method, device and electronic device for selecting translator genes provided by the embodiments of the present disclosure are to select multiple sets of translator genomes from the translator gene pools of all translators in advance, and calculate the Z values corresponding to these translator genomes to select a Z value to satisfy
  • the conditional translator genome is set as the final selection result, so that the selected translator genes can better reflect the differences between translators.
  • FIG. 1 is a schematic flowchart of a method for selecting translator genes according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram showing a relationship between a translator's characteristics and a translator's gene in a method for selecting a translator's gene according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of calculating a Z value in a method for selecting translator genes according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a translator gene selection device according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • the embodiments of the present disclosure select multiple sets of translator genomes from the translator's translator gene pool in advance, and calculate the Z values corresponding to these translator genomes to select translator genomes whose Z values meet the set conditions as the final.
  • the result of the selection makes the selected translator genes better reflect the differences between translators.
  • the Z value represents the Z value in the verification of large sample differences.
  • this embodiment provides a method for selecting a translator gene.
  • a schematic flowchart of a method for selecting a translator gene according to an embodiment of the present disclosure includes:
  • a candidate translator gene list is established in advance according to all attribute information of the translator, and the candidate translator gene list may include all genes related to specific attributes of the translator.
  • the candidate translator gene list can be considered as a gene pool in which genes related to translator information extracted from all translators are stored in units of genes, that is, translator genes.
  • the translator gene mainly refers to the unique key information combination that exists in a particular translator and is different from other translators by analyzing, calculating, and quantifying the characteristic attributes of the translator.
  • each set of translator genes is used to form a genome, which is used as the translator genome, and the translator genome is the first-selected translator genome. It can be understood that when selecting the translator genes for each group, multiple translator genes in the table can be randomly selected from the candidate translator gene list, and then using these randomly extracted translator genes can form a genome, that is, a translator Genome.
  • extraction rules in advance, such as simultaneous extraction or sequential extraction, interlaced extraction or specified line number extraction, extraction based on different translator information of the gene characterization, the number of extractions, and so on.
  • extraction rules in advance, such as simultaneous extraction or sequential extraction, interlaced extraction or specified line number extraction, extraction based on different translator information of the gene characterization, the number of extractions, and so on.
  • a corresponding plurality of genes are extracted from the candidate translator gene list.
  • the translator genome can be input to a given matching model, and the matching result is sampled multiple times using the given matching model. Each sampling can obtain a matching success rate sample.
  • the genes in the set of translator genomes are input into the matching model, and the matching model will be based on the manuscript genes provided by itself. Automatically calculate and output the matching success rate value of the genes in the translator's genome and the manuscript gene.
  • the matching success rate value output by the matching model can be used as a matching success rate sample.
  • multiple matching success rate samples can be obtained.
  • each matching success rate sample is actually a matching success rate value obtained by sampling a matching result.
  • n matching success rate samples are obtained respectively p 1 , p 2 , ... p n .
  • the average value of the matching success rate of the translator's genome is calculated as:
  • E (p) represents the mean of the matching success rate corresponding to the translator's genome
  • p i represents the ith matching success rate sample of the translator's genome
  • n represents the total number of matching success rate samples collected for the translator's genome.
  • S represents the standard deviation of the matching success rate of the translator's genome
  • E (p) represents the mean of the matching success rate of the translator's genome
  • p i represents the ith matching success rate sample of the translator's genome
  • n represents the translator's genome Total number of samples with matching success rates collected by the genome.
  • the method before the step of sampling the matching result multiple times and obtaining multiple matching success rate samples, the method further includes: setting the matching result sampling according to the requirement of the calculation accuracy of the gene matching with the manuscript to be translated.
  • the total number of times threshold that is, the threshold is set. Accordingly, in actual sampling, the number of samples with matching success rate is not less than the total number of times threshold. For example, for each translator's genome, the number of matching success rate samples to be extracted is not less than 50, and the data 50 is the preset total number of times threshold.
  • the Z value is calculated for each of the initially selected translator genomes. Specifically, for each translator genome, the corresponding Z value is calculated according to the standard deviation of the corresponding matching success rate and the average of the corresponding matching success rates of all the translator genomes.
  • the concept of the Z value is a large sample difference verification, that is, the concept of the Z value in the Z verification.
  • the Z-test is a method generally used to test the difference in the average of a large sample (ie, the sample size is greater than 30). It uses the theory of the standard normal distribution to infer the probability of a difference occurring, thereby comparing whether the difference between the two averages is significant. When the standard deviation is known, verify that the mean of a set of numbers is equal to a certain expected value.
  • the Z-verification is used to measure the matching difference verification of the preliminary selected translator genomes, so the Z value calculation is performed for each of the preliminary selected translator genomes.
  • the Z value of each translator's genome can be calculated, and according to the Z value, the differential performance of each corresponding translator's genome when performing gene matching can be judged. Therefore, according to the Z value corresponding to each translator's genome, a preset setting condition can be used to determine whether the translator genome corresponding to the Z value satisfies the set difference requirement. If it is not satisfied, it will be removed from the genomes of the translators selected initially, and all the translator genomes that have not been removed will be the translator genomes that meet the requirements. Take out the remaining genes in the translator's genome, and after removing the duplicate genes in these genes, form a new set of genes, which will be the final selected translator genes.
  • the setting condition for selecting translator genes is set in advance such that the confidence level of the selected gene is not less than 95%, and the confidence level corresponds to a Z value of the translator genome of 1.96. Then, for each initially selected translator genome, the corresponding Z value is compared with 1.96. If the Z value is greater than 1.96, the translator genome corresponding to the Z value is eliminated, otherwise, the translator genome corresponding to the Z value is retained .
  • n-p translator genomes that do not meet the set conditions are removed from all the n initially selected translator genomes, and the remaining n-p translator genomes meet the set conditions. Then, in the n-p translator genomes, two or more translator genomes may contain a translator gene at the same time. Therefore, all the translator genes in the n-p translator genomes are taken out and put into a new gene pool. In this gene pool, for each translator gene that appears multiple times, the extra one is removed and only one translator gene is retained. In the end, this new gene pool contains multiple non-repeated translator genes, and these genes are used as the final selected translator genes.
  • a plurality of sets of translator genomes are selected from the translator gene pools of all translators in advance, and translators whose Z values satisfy the set conditions are selected by calculating the Z values corresponding to the translator genomes
  • the genome is used as the final selection result, so that the selected translator genes can better reflect the differences between translators.
  • a translator selected according to this can be more reasonably matched with a manuscript to be translated, thereby effectively improving translation efficiency and accuracy.
  • the method of the embodiment of the present disclosure before the step of selecting a plurality of different genes from the candidate translator gene list, the method of the embodiment of the present disclosure further includes:
  • a candidate translator gene list is formed.
  • Basic information personal information about the translator, such as name, age, location and contact information;
  • Competence information including the translation ability information of translators, such as language direction, industry areas, and translation speed.
  • Credit information credit information accumulated by translators during the translation process, such as the timely submission rate and the rejection rate during the process;
  • a candidate translator gene list is constructed.
  • a candidate translator gene list corresponding to the basic information can be constructed as shown in Table 1, which is a basic information candidate translator gene list according to an embodiment of the present disclosure.
  • the translator's gene selection method extracts translator's genes from four aspects of the translator's basic information, ability information, credit information, and experience information, and forms a candidate translator gene list according to the The selection and matching of excellent translator genes can more comprehensively consider the special information of different aspects of translators, and provide a reliable basis for more reasonable gene matching.
  • the step of extracting corresponding genes from all the basic information, ability information, credit information and experience information of the translators further includes:
  • the translator's direct genes are extracted.
  • translator genes exist in translators. Different translators have different genes. They have commonality but it is more important to extract different genes so that they can be treated differently and match the best translator.
  • genes are not characteristics and cannot be easily and clearly identified, so steps need to be taken to extract them. There is an essential difference between a gene and a feature.
  • a feature is an abstraction of a concept shared by the object. Features contain subdivision attributes, and the most fundamental information of the objects contained in the attributes-genes.
  • the corresponding feature information is extracted as the features of the translator.
  • the most basic information of the translators is extracted to form the direct genes of the translators.
  • FIG. 2 it is a schematic diagram showing the relationship between the characteristics of the translator and the translator's genes in the method for selecting translator's genes according to the embodiment of the present disclosure.
  • the method for selecting translator genes provided in the embodiments of the present disclosure by extracting the features of the translator and further extracting the translator genes, can obtain unique key information existing in a particular translator and different from other translators.
  • performing multiple matching result sampling and obtaining multiple matching success rate samples further includes:
  • a random translator genome is selected from all the translator genomes, a matching test is performed on the selected translator genome, and the current match of the translator genome is updated based on the matching success rate results of the current matching test of the translator genome and the historical matching success rate results. Success rate value
  • the matching effect with the manuscript needs to be determined, so as to select a translator gene that is more suitable for gene matching.
  • multiple matching results were sampled. Specifically, when the matching results of the preliminary selected translator genomes are sampled, the above matching model is used.
  • a given matching model can be used to perform multiple rounds of multiple matching results sampling.
  • m groups of translator genomes are selected according to the above embodiments, and then the matching success rate of each translator's genome can be sampled, and multiple rounds based on the above m genomes are performed multiple times (typically (Less than 30) matching experiments, each round of matching test process is as follows:
  • Step 1 Initialize the value of the matching success rate of each translator's genome, for example, initialize it to 0.
  • Step 2 Randomly select a translator's genome and calculate the matching success rate result in a given matching model to obtain the matching success rate result of this matching experiment.
  • the current matching success rate value of the selected translator's genome is calculated by combining the matching success rate results of previous matching experiments in the history of multiple matching result sampling in the current round, that is, the historical matching success rate results.
  • Step 3 Repeat the above steps 1 and 2 for multiple cycles. Since each translator's genome is selected randomly from all the translator's genomes, the number of matching tests for each genome may be different. When the number of matching experiments reaches the first set threshold, the current round of matching experiments on the translator's genome is stopped, and the current matching success rate value of the translator's genome when the testing is stopped is recorded.
  • Step 4 For the rest of the translator genome except for the translator genome that has reached the first set threshold, continue to perform the processing steps of steps 1-3 until the total number of matching tests in the current round reaches the second set threshold, and stop the current round. Matching test. At this time, for each translator's genome, there is a matching success rate value corresponding to it, that is, the matching success rate sample obtained from the multiple matching results of this round. Then, for m translator genomes, m matching success rates can be obtained. sample. Then, for all translator genomes, multiple rounds of matching result sampling described above (for example, reaching the third set threshold) are performed to obtain multiple matching success rate samples for each translator genome. For example, if the number of rounds is set to n, then The number of matching success samples is n (n is generally not less than 50).
  • a match of a test the test result is a successful match, a match is obtained a success rate value 100%.
  • a third select further assume again to select a 1, and the matching results of the matching is unsuccessful, then a total of two according to a match test results, a 1 to obtain the current value of the matching success rate of 50% .
  • the fourth selection is performed. Assuming that a 3 is selected and the matching test result is successful, the matching success rate of a 3 is 100%.
  • the fifth then selected, and is assumed to select a 1, and the matching results of the matching is successful, according to a total of three test matches a result, a success rate of matching a current value of 66.6% pair.
  • the number of matching tests on a 1 has reached the first set threshold of 3, then stop continuing to perform matching tests on a 1 and output its current matching success rate value of 66.6%, which is the result of this round of multiple matches
  • the sixth selection further, has been reached due to a 1 3 matching tests, only randomized in a 2 and A 3 to select and match tests, tests matched the specific selection and process steps similar to the above.
  • the total number of matching tests that is, the total number of matching tests on a 1 , a 2, and a 3 reaches the second set threshold of 8 times
  • the current round of multiple matching result sampling is ended.
  • a matching success rate sample was obtained according to the above matching experiment.
  • each round will get a set of matching success rates corresponding to a 1 , a 2, and a 3 respectively. sample. Until the repeated discussion reaches the third set threshold value 5, then five matching success rate samples corresponding to a 1 , a 2, and a 3 respectively can be obtained.
  • the method for selecting translator genes uses multiple matching success rates of each translator's genome to calculate using a given matching model, and selects the translator's genome with a higher matching success rate according to this, which can make the calculation result more reliable. .
  • FIG. 1 The schematic diagram of the process of calculating the Z value in the provided method of translator gene selection, including:
  • the average value of the matching success rate corresponding to each translator's genome can be calculated. Then, in this embodiment, first, based on the average matching success rate corresponding to each translator's genome, the average matching success rate corresponding to the entire translator's genome is calculated, that is, the uniform average of the matching success rate. Specifically, it can be calculated according to the following formula:
  • represents a uniform mean value of the matching success rate corresponding to the entire genome of all translators
  • m represents the number of the genomes of all the translators initially selected
  • E i (p) represents the average matching success rate corresponding to the ith translator's genome.
  • the uniform mean corresponding to all the translator's genomes is calculated and combined with the above embodiment to obtain the standard deviation and the mean of the matching success rate corresponding to each translator's genome.
  • the given Z value is used to calculate the formula.
  • the Z value of each selected translator's genome can be calculated correspondingly.
  • the step of calculating the Z value corresponding to the translator's genome further comprises: using the following formula to calculate the Z value corresponding to each translator's genome:
  • Z i represents the Z value corresponding to the ith translator's genome
  • n represents the number of matching success rate samples corresponding to each translator's genome
  • E i (p) represents the mean corresponding to the ith translator's genome
  • represents all The uniform mean corresponding to the translator's genome
  • S i represents the standard deviation corresponding to the ith translator's genome.
  • the method for selecting translator genes uses the corresponding average value of each translator's genome initially selected to sequentially calculate the unified average value of all translator genomes and the Z value of each translator genome, which can more accurately characterize each The success rate of the translator's genome matching, so that the translator's genes can be more accurately selected to match the manuscript genes, and the matching effect is improved.
  • the method of the embodiment of the present disclosure may further include the following processing steps: if there is no translator in all the translator genomes The Z value of the genome can meet the setting conditions of the foregoing embodiment, then return to step S101, reselect a plurality of different translator genomes from the candidate translator gene list, and perform the calculation and selection process of the above embodiment again.
  • the Z calculated for the translator's genome is calculated.
  • the value should be no greater than 1.96. In practical applications, when multiple sets of translator genomes are selected from the candidate translator gene list, it may be due to random selection and other reasons. When the Z value is calculated for the selected translator genome, the Z value cannot meet the above criteria. You need to select another translator genome from the candidate translator gene list, and recalculate and select it.
  • the method for selecting translator genes provided by the embodiments of the present disclosure can ensure that high-quality genes that meet the requirements can be selected by judging the calculation results and repeating the selection steps, which is of great significance for more accurate matching of manuscripts to be translated. .
  • the embodiments of the present disclosure provide a device for selecting translator genes according to the foregoing embodiments, and the device is configured to implement selection of the final translator genes in the foregoing embodiments. Therefore, the descriptions and definitions in the method for selecting translator genes in the above-mentioned embodiments can be used for the understanding of each execution module in the embodiments of the present disclosure, and specific reference may be made to the above-mentioned embodiments, which will not be repeated here.
  • FIG. 4 is a schematic structural diagram of a translator gene selection device according to an embodiment of the present disclosure.
  • the device can be used to perform the foregoing methods.
  • the translator's gene selection includes an initial gene selection module 401, a first calculation module 402, a second calculation module 403, and a final gene selection module 404.
  • the initial gene selection module 401 is used to select multiple different genes from the candidate translator gene list to form multiple translator genomes; the first calculation module 402 is used to perform multiple matching result sampling for each translator genome To obtain multiple matching success rate samples, and based on the multiple matching success rate samples, calculate the mean and standard deviation of the matching success rate corresponding to the translator's genome; the second calculation module 403 is used to calculate the average and The standard deviation of a translator's genome is used to calculate the Z value of the translator's genome.
  • the final gene selection module 404 is used to select the translator's genome that meets the set conditions from all the translator's genome based on the Z value of each translator's genome The genes in the translator genome that meet the set conditions are merged to obtain the translator genes finally selected; wherein the Z value represents the Z value in the verification of the large sample difference.
  • the initial gene selection module 301 may select a plurality of sets of translator genes according to a pre-established candidate translator gene list, and construct a genome from each set of translator genes as the translator genome, and the translator genome is initially selected.
  • Translator genome For example, when selecting each group of translator genes, the initial gene selection module 301 may randomly select multiple translator genes in the table from the candidate translator gene list and use these randomly extracted translator genes to form a genome, that is, Translator genome.
  • the first calculation module 302 can input the translator genome into a given matching model, use the given matching model to perform multiple matching result sampling, and each sampling can be obtained A matching success rate sample. It can be understood that each matching success rate sample is actually a matching success rate value obtained by sampling a matching result.
  • the first calculation module 302 calculates a comprehensive matching success rate of the translator's genome based on multiple matching success rate samples obtained by sampling the multiple matching results described above, that is, calculating the translator respectively.
  • the mean and standard deviation of the matching success rate for the genome is, based on the mean and standard deviation of the matching success rate for the genome.
  • the second calculation module 403 calculates a Z value for each initially selected translator genome. Specifically, for each translator genome, the corresponding Z value is calculated according to the standard deviation of the corresponding matching success rate and the average of the corresponding matching success rates of all the translator genomes.
  • the final gene selection module 404 can judge the differential performance of each corresponding translator's genome when performing gene matching based on the Z value obtained by the above calculation. Therefore, according to the Z value corresponding to each translator's genome, the final gene selection module 404 can use a preset setting condition to determine whether the translator genome corresponding to the Z value satisfies the set difference requirement. If it is not satisfied, it will be removed from the genomes of the translators selected initially, and all the translator genomes that have not been removed will be the translator genomes that meet the requirements. The final gene selection module 404 then takes out all the remaining genes in the translator's genome, and after removing the duplicate genes in these genes, forms a new set of genes, which are the final selected translator genes.
  • the apparatus of the embodiment of the present disclosure further includes a candidate translator gene list construction module, which is used to: extract corresponding corresponding information from all the basic information, ability information, credit information, and experience information of the translators. Genes, corresponding to the basic information genes, ability information genes, credit information genes, and experience information genes that form the translator; based on the basic information genes, ability information genes, credit information genes, and experience information genes, forming the candidate translator gene list .
  • the candidate translator gene list building module is specifically used to: obtain all basic information, ability information, credit information, and experience information of the translator, and obtain the characteristics of the interpreter from the basic information, ability information, credit information, and experience information, respectively Based on the characteristics of the translator, extract the translator's direct genes.
  • the second calculation module is specifically configured to calculate a uniform mean of the matching success rate of all translator genomes based on the respective averages of all the translator genomes; based on the standard deviation and the average value of each translator genomes, and all the translator genomes Corresponding uniform mean, calculate the Z value corresponding to the translator's genome.
  • the second calculation module is specifically used to calculate the Z value corresponding to each translator's genome using the following formula:
  • Z i represents the Z value corresponding to the ith translator's genome
  • n represents the number of matching success rate samples corresponding to each translator's genome
  • E i (p) represents the mean corresponding to the ith translator's genome
  • represents all The uniform mean corresponding to the translator's genome
  • S i represents the standard deviation corresponding to the ith translator's genome.
  • the first calculation module is specifically configured to execute the following processing flow for any round of multiple matching result sampling: initial setting of initial values of the matching success rate of all translator genomes; randomization from all translator genomes Select a translator's genome, perform a matching test on the selected translator's genome, and update the current matching success rate value of the translator's genome based on the matching success rate result and the historical matching success rate result of the current matching test on the translator's genome; repeat Steps are randomly selected to update until the number of matching tests on any translator's genome reaches a first set threshold, stop the matching test on the translator's genome, and record the current matching success rate value of the translator's genome; For translator genomes other than the translator genome, repeat the steps of randomly selecting to recording until the total number of matching experiments on all translator genomes reaches the second set threshold, then record the current matching success rate value of each translator genome and end Sampling multiple matching results in this round, enter A multiple match result of the sampling, the total number of rounds repeatedly performed until a matching result of the sampling reaches a third threshold
  • the first calculation module is specifically configured to, for each translator's genome, the number of extracted matching success rate samples is not less than a set threshold.
  • the relevant program modules in the apparatuses of the foregoing embodiments may be implemented by a hardware processor.
  • the translator gene selection device of the embodiment of the present disclosure is used to select translator genes in the foregoing method embodiments, the beneficial effects produced are the same as those of the corresponding method embodiments, and reference may be made to the foregoing method embodiments. I won't repeat them here.
  • this embodiment provides an electronic device according to the foregoing embodiments.
  • a schematic diagram of a physical structure of the electronic device according to an embodiment of the present disclosure includes: at least one memory 501, at least A processor 502, a communication interface 503, and a bus 504.
  • the memory 501, the processor 502, and the communication interface 503 communicate with each other through a bus 504.
  • the communication interface 503 is used for information transmission between the electronic device and the interpreter information device.
  • the memory 501 stores information that can be stored on the processor 502.
  • the running computer program when the processor 502 executes the computer program, implements the method for selecting translator genes as described in the above embodiments.
  • the electronic device includes at least a memory 501, a processor 502, a communication interface 503, and a bus 504, and the memory 501, the processor 502, and the communication interface 503 form a communication connection with each other through the bus 504, and can complete each other Communications, such as the processor 502 reading program instructions for the method of selecting translator genes from the memory 501, and the like.
  • the communication interface 503 can also realize the communication connection between the electronic device and the translator information device, and can complete the information transmission between each other, such as the selection of the translator's genes through the communication interface 503.
  • the processor 502 calls program instructions in the memory 501 to execute the methods provided in the foregoing method embodiments, for example, including: selecting a plurality of different genes from a candidate translator gene list to form multiple The translator's genome; for each translator's genome, multiple matching results are sampled to obtain multiple matching success rate samples, and based on the multiple matching success rate samples, the mean and standard deviation of the matching success rate corresponding to the translator's genome are calculated; based on all The average value of the translator genome and the standard deviation of each translator genome are used to calculate the Z value corresponding to the translator genome.
  • the translator genomes that meet the set conditions are selected from all the translator genomes, and The genes in the translator's genome that meet the set conditions are combined to obtain the translator genes finally selected; wherein the Z value represents the Z value in the verification of large sample differences.
  • the program instructions in the foregoing memory 501 may be implemented in the form of software functional units and sold or used as independent products, and may be stored in a computer-readable storage medium. Alternatively, all or part of the steps of the foregoing method embodiments may be completed by a program instructing related hardware.
  • the foregoing program may be stored in a computer-readable storage medium. When the program is executed, the execution includes the implementation of the foregoing methods.
  • the above-mentioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc. Program code medium.
  • the embodiments of the present disclosure also provide a non-transitory computer-readable storage medium according to the foregoing embodiments.
  • the non-transitory computer-readable storage medium stores computer instructions that cause a computer to execute a translator as described in the foregoing embodiments.
  • Methods for selecting genes include, for example, selecting multiple sets of genes from the candidate translator gene list to form multiple translator genomes; for each translator genome, sampling multiple matching results to obtain multiple matching success rate samples And calculate the mean and standard deviation of the matching success rate corresponding to the translator's genome based on multiple matching success rate samples; calculate the Z corresponding to the translator's genome based on the mean corresponding to all the translator's genome and the standard deviation corresponding to each translator's genome Based on the Z value corresponding to each translator's genome, select translator genomes from all the translator genomes that meet the set conditions, and combine the genes in the translator genomes that meet the set conditions to obtain the final selected translator genes; where, The Z value represents the Z value in the verification of large sample differences
  • a plurality of sets of translator genomes are selected in advance from the translator gene pool of all translators by executing the translator gene selection methods described in the foregoing embodiments, and Calculate the Z values corresponding to these translator genomes, and select the translator genomes whose Z value meets the set conditions, as the final selection result, so that the selected translator genes can better reflect the differences between translators.
  • a translator selected according to this can be more reasonably matched with a manuscript to be translated, thereby effectively improving translation efficiency and accuracy.
  • each embodiment can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware.
  • the computer software product can be stored in a computer-readable storage medium, such as a U disk, a removable hard disk. , ROM, RAM, magnetic disk, or optical disk, etc., including a number of instructions for causing a computer device (such as a personal computer, server, or network device, etc.) to execute each of the above method embodiments or some parts of the method embodiments Methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本公开实施例提供一种译员基因的选取方法、装置与电子设备,该方法包括:从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;对于每一个译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并据此计算该译员基因组对应的匹配成功率的均值和标准差;基于所有译员基因组分别对应的均值和每一个译员基因组对应的标准差,计算该译员基因组对应的Z值;基于每一个译员基因组对应的Z值,选取满足设定条件的译员基因组,并将满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因。本公开实施例能够选取更有效的译员基因组合与待翻译稿件进行匹配,从而有效提高翻译效率和翻译准确率。

Description

译员基因的选取方法、装置与电子设备
交叉引用
本申请引用于2018年09月19日提交的专利名称为“译员基因的选取方法、装置与电子设备”的第2018110957991号中国专利申请,其通过引用被全部并入本申请。
技术领域
本公开实施例涉及数据处理技术领域,更具体地,涉及一种译员基因的选取方法、装置与电子设备。
背景技术
信息时代和网络化使翻译工作方式发生了很大的变化。利用翻译流程管理平台,根据不同对象储存人才资料,以根据待翻译的稿件匹配出最适合的译员。不同的译员,其所包含的关键信息不尽相同,则根据这些关键信息,可以为译员匹配最适合的翻译稿件,从而有效提高翻译效率和翻译准确性。
译员与稿件的基因匹配是指将稿件基因与译员基因在既定策略下通过匹配模型,实现为稿件找到最佳译员的过程。所选取的用于进行基因匹配的译员基因与其它译员基因相比,应该能够更好的体现译员的差异性,如此才能为待翻译稿件匹配到更适合的译员。
译员基因主要指通过对译员特征属性进行分析计算、量化处理,所获取到的存在于特定译员的、区别于其他译员的、独一无二的关键信息组合。译员基因的来源渠道很多,在社交时代,译员的一举一动的所有数据均可以提取出基因来。
译员基因存在于管理平台的所有译员中,不同译员具备不同的译员基因。由于具体应用的不同,目前存在的译员/文稿基因匹配算法在选择译员的待匹配基因进行匹配计算时,常根据经验来选择相应的基因组合。
但是,在译员工作过程中,基因会随着能力的提升、时间的增加、知识的积累而发生相应的变化。即随着任务的处理、审校和QC的评价、历 史语料的积累、社区活动的参与以及译员能力的测试等活动,译员基因将不断更新。因此,上述依据经验的译员基因选择方式会存在一定的局限性,导致选择出的译员基因不能很好的体现译员间的差异性。
发明内容
为了克服上述问题或者至少部分地解决上述问题,本公开实施例提供一种译员基因的选取方法、装置与电子设备,用以使得选取出的译员基因能够更好的体现译员间的差异性。
第一方面,本公开实施例提供一种译员基因的选取方法,包括:从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;对于每一个所述译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于所述多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;基于所有所述译员基因组分别对应的所述均值和每一个所述译员基因组对应的所述标准差,计算该译员基因组对应的Z值;基于每一个所述译员基因组对应的所述Z值,从所有所述译员基因组中选取满足设定条件的译员基因组,并将所述满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;其中,所述Z值表示大样本差异性验证中Z值。
第二方面,本公开实施例提供一种译员基因的选取装置,包括:初始基因选取模块,用于从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;第一计算模块,用于对于每一个所述译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于所述多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;第二计算模块,用于基于所有所述译员基因组分别对应的所述均值和每一个所述译员基因组对应的所述标准差,计算该译员基因组对应的Z值;最终基因选取模块,用于基于每一个所述译员基因组对应的所述Z值,从所有所述译员基因组中选取满足设定条件的译员基因组,并将所述满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;其中,所述Z值表示大样本差异性验证中Z值。
第三方面,本公开实施例提供一种电子设备,包括:至少一个存储器、至少一个处理器、通信接口和总线;所述存储器、所述处理器和所述通信 接口通过所述总线完成相互间的通信,所述通信接口用于所述电子设备与译员信息设备之间的信息传输;所述存储器中存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如上第一方面所述的译员基因的选取方法。
第四方面,本公开实施例提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如上第一方面所述的译员基因的选取方法。
本公开实施例提供的译员基因的选取方法、装置与电子设备,通过预先从所有译员的译员基因池中选取多组译员基因组,并通过计算这些译员基因组所对应的Z值,来选取Z值满足设定条件的译员基因组,以作为最终的选取结果,使得选取出的译员基因能够更好的体现译员间的差异性。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的译员基因的选取方法的流程示意图;
图2为根据本公开实施例提供的译员基因的选取方法中译员特征与译员基因的关系示意图;
图3为根据本公开实施例提供的译员基因的选取方法中计算Z值的流程示意图;
图4为本公开实施例提供的译员基因的选取装置的结构示意图;
图5为本公开实施例提供的电子设备的实体结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开实施例的一部分实施例,而不是全部的实施例。基于本公开实施例中的实施例,本领域普通技术人员 在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开实施例保护的范围。
译员基因的来源渠道很多,在社交时代,译员的一举一动的所有数据均可以提取出基因来。由于具体应用的不同,目前存在的译员/文稿基因匹配算法在选择译员的待匹配基因进行匹配计算时,常根据经验来选择相应的基因组合。但是,传统方法具有一定的局限性,导致选择出的译员基因不能很好的体现译员的差异性。
针对上述问题,本公开实施例通过预先从译员的译员基因池中选取多组译员基因组,并通过计算这些译员基因组所对应的Z值,来选取Z值满足设定条件的译员基因组,以作为最终的选取结果,使得选取出的译员基因能够更好的体现译员间的差异性。其中,Z值表示大样本差异性验证中Z值。
作为本公开实施例的一个方面,本实施例提供一种译员基因的选取方法,参考图1,为本公开实施例提供的译员基因的选取方法的流程示意图,包括:
S101,从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组。
可以理解为,在进行本实施例的译员基因选取之前,事先会根据译员的所有属性信息建立一个备选译员基因列表,该备选译员基因列表中可以包含与译员特定属性相关的所有基因。具体的,该备选译员基因列表可以认为是一个基因池,在该基因池中以基因为单位存放有从所有译员中提取的与译员信息相关的基因,即译员基因。译员基因主要指通过对译员特征属性进行分析计算、量化处理,所获取到的存在于特定译员的、区别于其他译员的、独一无二的关键信息组合。
本步骤中根据该备选译员基因列表,分别选取多组译员基因,并分别以每一组译员基因构成一个基因组,作为译员基因组,该译员基因组为初选出的译员基因组。可以理解的是,在进行各组译员基因的选择时,可以从备选译员基因列表中随机抽选表中的多个译员基因,则利用这些随机抽取的译员基因可以构成一个基因组,即为译员基因组。
当然,也可以事先定义抽取规则,如,同时抽取或依次抽取,隔行 抽取或指定行号抽取,根据基因表征的不同译员信息抽取,抽取的数量,等等。之后在进行实际的抽取过程时,对于每一组译员基因的抽取,根据该预先定义的抽取规则,从备选译员基因列表抽取相应的多个基因。
例如,从备选译员基因列表中随机选取3-5个不同的基因,作为一组基因,构成一个译员基因组。则采用相同的方式,可以同时分别选取也可以依次选取多组基因,构成多个译员基因组,本公开实施例对此不作限制。
S102,对于每一个译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差。
可以理解为,对于每一组初选出的译员基因组而言,需要确定其与稿件的匹配效果,从而选择更适于基因匹配的译员基因。同时,为了不失一般性,对于每一组译员基因组,可以将该译员基因组输入给定匹配模型,利用给定匹配模型进行多次匹配结果采样,每次采样可获取一个匹配成功率样本。
可以理解的是,对于每一组译员基因组,在利用匹配模型进行匹配成功率样本的采集时,将该组译员基因组中的基因输入到匹配模型中,该匹配模型会根据自身提供的稿件基因,自动计算该译员基因组中的基因与稿件基因的匹配成功率数值并输出,则匹配模型输出的该匹配成功率数值即可作为一个匹配成功率样本。对于同一译员基因组,进行多次上述匹配结果采样过程,则可以得到多个匹配成功率样本。
之后,对于每一个初选出的译员基因组而言,根据上述多次匹配结果采样获取的多个匹配成功率样本,计算该译员基因组的综合匹配成功率,即分别计算该译员基因组对应的匹配成功率的均值和标准差。可以理解的是,每一个匹配成功率样本,实际上是一次匹配结果采样得到的匹配成功率数值。
例如,假设根据某个译员基因组进行匹配结果采样,得到n个匹配成功率样本分别为p 1,p 2,...p n。则据其计算该译员基因组对应的匹配成功率的均值为:
Figure PCTCN2018124951-appb-000001
式中,E(p)表示译员基因组对应的匹配成功率的均值,p i表示译员基因组的第i个匹配成功率样本,n表示针对该译员基因组采集的匹配成功率样本的总个数。
在此基础上,计算该译员基因组对应的匹配成功率的标准差如下:
Figure PCTCN2018124951-appb-000002
式中,S表示译员基因组对应的匹配成功率的标准差,E(p)表示译员基因组对应的匹配成功率的均值,p i表示译员基因组的第i个匹配成功率样本,n表示针对该译员基因组采集的匹配成功率样本的总个数。
其中,在一个实施例中,在进行多次匹配结果采样,获取多个匹配成功率样本的步骤之前,还包括:根据与待翻译稿件的基因匹配计算精度的需求,设定进行匹配结果采样的总次数阈值,即设定阈值。则相应的在实际采样时,采集匹配成功率样本的个数不小于该总次数阈值。例如,对于每一个译员基因组,要求提取的匹配成功率样本的个数不少于50,则该数据50即为预先设定的总次数阈值。
S103,基于所有译员基因组分别对应的均值和每一个译员基因组对应的标准差,计算该译员基因组对应的Z值;其中,Z值表示大样本差异性验证中Z值。
可以理解为,在根据上述步骤计算得到每个初选出的译员基因组对应的匹配成功率的均值和标准差的基础上,对于每一个初选出的译员基因组,计算其Z值。具体而言,对于每一个译员基因组,根据其所对应的匹配成功率的标准差以及所有译员基因组分别对应匹配成功率的均值,计算其对应的Z值。
可以理解的是,其中的Z值的概念为大样本差异性验证,即Z验证中的Z值的概念。Z检验是一般用于大样本(即样本容量大于30)平均值差异性检验的方法。它是用标准正态分布的理论来推断差异发生的概率,从而比较两个平均数的差异是否显著。当已知标准差时,验证一组数的均值是否与某一期望值相等。本公开实施例中利用Z验证来衡量初选出 的译员基因组的匹配差异性验证,因此对每一个初选出的译员基因组进行Z值计算。
S104,基于每一个译员基因组对应的Z值,从所有译员基因组中选取满足设定条件的译员基因组,并将该满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因。
可以理解为,根据上述步骤,可以计算出每个译员基因组的Z值,根据该Z值可以判断各对应译员基因组在进行基因匹配时的差异性性能。因此,根据每个译员基因组对应的Z值,可以利用预先设定的设定条件,判断该Z值对应的译员基因组是否满足设定的差异性要求。如果不满足,则将其从初选出的各译员基因组中剔除,最终剩余没有被剔除的所有译员基因组即为符合要求的译员基因组。将剩余的所有译员基因组中的基因取出,并在去除这些基因中的重复基因后,形成新的一组基因,即作为最终选取的译员基因。
例如,假设针对某个译员基因组总共采集了n个匹配成功率样本,这些匹配成功率样本符合正态分布。同时,预先设定了选择译员基因的设定条件为,选出的基因的置信度不低于95%,该置信度对应到译员基因组的Z值为1.96。则,对于初选出的每一个译员基因组,将其对应的Z值与1.96进行比较,若Z值大于1.96,则将该Z值对应的译员基因组剔除,否则,保留该Z值对应的译员基因组。
假设根据上述处理过程,从所有n个初选出的译员基因组中剔除了p个不满足设定条件的译员基因组,剩余的n-p个译员基因组是满足设定条件的。则,在这n-p个译员基因组中,可能有两个或者两个以上的译员基因组中同时包含了某个译员基因。因此将这n-p个译员基因组中的全部译员基因取出,放入一个新的基因池中,在该基因池中,对于出现多次的每个译员基因,剔除多余的而仅保留一个该译员基因。最终这个新的基因池中所包含的是多个非重复的译员基因,将这些基因作为最终选取的译员基因。
本公开实施例提供的译员基因的选取方法,通过预先从所有译员的译员基因池中选取多组译员基因组,并通过计算这些译员基因组所对应的Z值,来选取Z值满足设定条件的译员基因组,以作为最终的选取结 果,使得选取出的译员基因能够更好的体现译员间的差异性。另外,在基因匹配应用中,能够使据此选取的译员与待翻译稿件进行更合理的匹配,从而有效提高翻译效率和翻译准确率。
其中,在一个实施例中,在从备选译员基因列表中,分别选取多组不同的基因的步骤之前,本公开实施例的方法还包括:
分别从所有译员的基础信息、能力信息、信用信息和经验信息中提取相应的基因,并对应形成译员的基础信息基因、能力信息基因、信用信息基因和经验信息基因;
基于基础信息基因、能力信息基因、信用信息基因和经验信息基因,构成备选译员基因列表。
可以理解为,译员基因的来源渠道很多,在社交时代,译员的一举一动的所有数据均可以提取出基因来,通过译员基因的来源渠道,本实施例从以下几个方面提取译员基因,构成备选译员基因列表:
基础信息,译员的个人相关信息,如姓名、年龄、所在地以及联络方式等;
能力信息,译员拥有的翻译能力信息,如擅长的语种方向、行业领域以及翻译速度等;
信用信息,译员在从事翻译工作过程中累积的信用信息,如及时交稿率以及中途退稿率等;
经验信息,译员在长期从事翻译工作过程中积累的相关经验,如翻译总字数以及总金额等。
基于译员的上述信息,分别提取译员对应的相应基因,并根据上述各方面,形成对应的基础信息基因、能力信息基因、信用信息基因和经验信息基因。之后,基于上述各方面的基因,构成备选译员基因列表。例如,对于译员的基础信息,可以构建基础信息对应的备选译员基因列表如表1所示,为根据本公开实施例的一种基础信息备选译员基因列表。
表1,根据本公开实施例的一种基础信息备选译员基因列表
Figure PCTCN2018124951-appb-000003
Figure PCTCN2018124951-appb-000004
则,在根据表1进行多个译员基因组的选取时,可以随机选择各数据项中的多个分别对应的译员基因,如抽选到“所学专业”对应的基因“石油开采”以及“海外工作及学习经历”对应的基因“有”,则以二者构成一个译员基因组。采用同样的处理过程,还可以选取别的多个不同的译员基因组。
同样的,假如事先设定了抽取规则为选取与译员资历信息相关的基因,可以选择表1中“IM”、“所学专业”、“出生日期”以及“海外工作经历”等对应的基因,构成译员基因组。
本公开实施例提供的译员基因的选取方法,通过从译员的基础信息、能力信息、信用信息和经验信息四个方面,分别提取译员的基因,并据此构成备选译员基因列表,以进行更优译员基因的选择与匹配,能够更全面的考虑译员不同方面的特殊信息,为更合理的进行基因匹配提供可靠依据。
其中,根据上述实施例可选的,分别从译员的所有基础信息、能力信息、信用信息和经验信息中提取相应的基因的步骤进一步包括:
获取译员的所有基础信息、能力信息、信用信息和经验信息,并分别从基础信息、能力信息、信用信息和经验信息中获取译员特征;
基于译员特征,提取译员的译员直接基因。
可以理解为,译员基因存在于译员中,不同译员具备不同的基因,有共性但更重要的是要提取差异性的基因,这样才可以差异化对待,匹配最佳译员。
但是,基因不是特征,无法简单明确的辨识,所以需要有步骤进行提取。基因与特征存在本质区别,特征是对对象所共有的特性抽象出某一概念。特征中包含细分属性,而属性中所包含的对象的最根本信息——基因。
因此本实施例在进行译员基因的提取时,首先根据上述实施例的译员的四个方面信息,提取对应的特征信息,作为译员特征。之后,根据不同的译员特征,分别提取译员的最根本信息,构成译员直接基因。例如,如图2所示,为根据本公开实施例提供的译员基因的选取方法中译员特征与译员基因的关系示意图。
本公开实施例提供的译员基因的选取方法,通过对译员特征的提取,进一步提取译员基因,能够获取到的存在于特定译员的、区别于其他译员的、独一无二的关键信息。
其中,根据上述各实施例可选的,进行多次匹配结果采样,获取多个匹配成功率样本的步骤进一步包括:
对于任一轮多次匹配结果采样,执行如下处理流程:
对所有译员基因组的匹配成功率的初始值进行初始设定;
从所有译员基因组中随机选取一个译员基因组,对选取的该译员基因组进行匹配试验,并基于对该译员基因组本次匹配试验的匹配成功率结果与历史匹配成功率结果,更新该译员基因组当前的匹配成功率值;
重复执行随机选取至更新的步骤,直至对任一译员基因组的匹配试验的次数达到第一设定阈值,停止对该译员基因组的匹配试验,并记录该译员基因组当前的匹配成功率值;
对停止匹配试验的译员基因组以外的译员基因组,重复执行随机选取至记录的步骤,直至对所有译员基因组的匹配试验的总次数达到第二设定阈值,则记录每个译员基因组当前的匹配成功率值,并结束本轮多次匹配结果采样,进入下一轮多次匹配结果采样,直至执行多次匹配结果采样的总轮数达到第三设定阈值,获取每个译员基因组的数量为第三设定阈值的匹配成功率样本。
可以理解为,根据上述各实施例,对于每一组初选出的译员基因组而言,需要确定其与稿件的匹配效果,从而选择更适于基因匹配的译员基因。同时,为了不失一般性,对于每一组译员基因组,进行多次匹配结果采样。而具体在进行每一组初选出的译员基因组的匹配结果采样时,利用上述匹配模型进行。
具体而言,可以利用给定匹配模型,进行多轮多次匹配结果采样。获取多个匹配成功率样本时,可以假设根据上述各实施例选取了m组译员基因组,则可以对每个译员基因组的匹配成功率进行采样,基于以上m个基因组进行多轮多次(一般不少于30次)匹配实验,每轮匹配试验过程如下:
步骤1,对每个译员基因组的匹配成功率的取值进行初始化设定,例如初始化设置为0。
步骤2,随机选择一个译员基因组,在给定匹配模型中进行匹配成功率结果计算,得到本次匹配试验的匹配成功率结果。同时,结合本轮多次匹配结果采样中历史记录的之前数次的匹配试验的匹配成功率结果,即历史匹配成功率结果,计算选取的该译员基因组当前的匹配成功率值。
步骤3,多次循环执行上述步骤1和2,由于每次选取译员基因组都是从所有译员基因组中随机选取,因此每个基因组被进行匹配试验的次数可能不同,则当对某个译员基因组的匹配试验的次数达到第一设定阈值的时候,即停止对该译员基因组的本轮匹配试验,并记录停止试验时,该译员基因组当前的匹配成功率值。
步骤4,对于除去达到第一设定阈值的译员基因组之外的其余译员基因组,继续执行上述步骤1-3的处理流程,直至本轮匹配试验的总次数达 到第二设定阈值,停止本轮匹配试验。此时对于每个译员基因组,均有一个匹配成功率值与之对应,即为本轮多次匹配结果采样得到的匹配成功率样本,则对于m个译员基因组,就能得到m个匹配成功率样本。那么,对于所有译员基因组,进行多轮(例如达到第三设定阈值)上述的多次匹配结果采样,即可以得到每个译员基因组的多个匹配成功率样本,例如轮数设为n,则匹配成功率样本数为n(n一般不小于50)。
例如,假设初选出了a 1、a 2和a 3共三个译员基因组,并预先设定第一设定阈值、第二设定阈值和第三设定阈值分别为3、8和5。则,在每一轮多次匹配结果采样时:
首先进行第一次选取,从a 1、a 2和a 3中随机选取一个,例如选取到a 1,则对a 1进行匹配试验,试验结果为匹配成功,则得到a 1的匹配成功率值为100%。
接下来进行第二次选取,假设选取到a 2,对其进行匹配试验,得到试验结果为匹配不成功,则得到a 2的匹配成功率值为0%。
接下来再进行第三次选取,假设又选取到a 1,且匹配试验结果为匹配不成功,则根据对a 1的总共两次匹配试验结果,得到a 1当前的匹配成功率值为50%。
接下来再进行第四次选取,假设选取到a 3,且匹配试验结果为匹配成功,则得到a 3的匹配成功率值为100%。
接下来再进行第五次选取,假设又选取到a 1,且匹配试验结果为匹配成功,则根据对a 1的总共三次匹配试验结果,得到a 1当前的匹配成功率值为66.6%。此时,对a 1的匹配试验次数已经达到了第一设定阈值3,则停止继续对a 1进行匹配试验,并输出其当前的匹配成功率值66.6%,即为本轮多次匹配结果采样中译员基因组a 1的匹配成功率样本。
接下来再进行第六次选取,由于对a 1已经达到3次匹配试验,则只在a 2和a 3中进行随机选取并进行匹配试验,具体选取和匹配试验流程与上述步骤类似。如此,直到总的匹配试验的次数,即对a 1、a 2和a 3的匹配试验的总次数达到第二设定阈值8次时,结束本轮多次匹配结果采样。此时,对于每一个译员基因组,均根据上述匹配试验得到了一个匹配成功率样本。
那么,对三个译员基因组a 1、a 2和a 3,重复多轮进行上述的多次匹配结果采样,则每一轮会得到a 1、a 2和a 3分别对应的一组匹配成功率样本。直到重复的论述达到第三设定阈值5,则可以得到a 1、a 2和a 3各自分别对应的5个匹配成功率样本。
本公开实施例提供的译员基因的选取方法,利用给定匹配模型进行各译员基因组的多次匹配成功率计算,并据此选取匹配成功率更高的译员基因组,可使计算结果可靠性更高。
其中,根据上述实施例可选的,基于所有译员基因组分别对应的均值和每一个译员基因组对应的标准差,计算该译员基因组对应的Z值的进一步处理步骤参考图3,为根据本公开实施例提供的译员基因的选取方法中计算Z值的流程示意图,包括:
S301,基于所有译员基因组分别对应的均值,计算所有译员基因组的匹配成功率的统一均值。
可以理解为,对于初选出的所有译员基因组来说,根据上述实施例可以计算出各译员基因组分别对应的匹配成功率的均值。则本实施例中首先根据各译员基因组分别对应的匹配成功率的均值,计算所有译员基因组整体所对应的匹配成功率的均值,即匹配成功率的统一均值。具体而言,可根据下式计算:
Figure PCTCN2018124951-appb-000005
式中,μ表示所有译员基因组整体对应的匹配成功率统一均值,m表示初选出的所有译员基因组的组数,E i(p)表示第i个译员基因组对应的匹配成功率的均值。
S302,基于每一个译员基因组对应的标准差和均值,以及所有译员基因组对应的统一均值,计算该译员基因组对应的Z值。
可以理解为,在上述步骤计算获取所有译员基因组对应的统一均值的基础上,结合上述实施例计算得到每一个译员基因组对应的匹配成功率的标准差和均值,利用给定的Z值计算公式,可以对应计算初选出的每个译员基因组Z值。
其中,在一个实施例中,计算该译员基因组对应的Z值的步骤进一步包括:利用如下公式,计算各译员基因组对应的Z值:
Figure PCTCN2018124951-appb-000006
式中,Z i表示第i个译员基因组对应的Z值,n表示每个译员基因组对应的匹配成功率样本的个数,E i(p)表示第i个译员基因组对应的均值,μ表示所有译员基因组对应的统一均值,S i表示第i个译员基因组对应的标准差。
本公开实施例提供的译员基因的选取方法,利用初选出的每一个译员基因组分别对应的均值,依次计算所有译员基因组的统一均值和每个译员基因组的Z值,能够更精确的表征每个译员基因组的匹配成功率情况,从而能够更精准的选取译员基因来与稿件基因进行匹配,改善匹配效果。
另外,在上述实施例的基础上,在从所有译员基因组中选取满足设定条件的译员基因组的步骤之后,本公开实施例的方法还可以包括如下处理步骤:若所有译员基因组中,没有一个译员基因组的Z值能够满足上述实施例的设定条件,则回到步骤S101,从备选译员基因列表中重新选取多组不同的译员基因组,重新进行上述实施例的计算与选取过程。
例如,对于采样的匹配成功率样本符合正态分布的情况,若要获得95%的置信度,即预先设定的设定条件是译员基因组的置信度满足95%,则针对译员基因组计算的Z值应不大于1.96。而实际应用中,在从备选译员基因列表中选取多组译员基因组时,可能由于是随机选取等原因,导致在对选取出的译员基因组计算Z值时,Z值均不能满足上述标准,则需要重新在备选译员基因列表中选择另外的译员基因组,并进行重新计算和选取。
本公开实施例提供的译员基因的选取方法,通过对计算结果的判断和对选取步骤的循环重复执行,能够保证可以选取出满足要求的高质量基因,对于更精准的匹配待翻译稿件具有重要意义。
作为本公开实施例的另一个方面,本公开实施例根据上述各实施例提供一种译员基因的选取装置,该装置用于在上述各实施例中实现对最终译员基因的选取。因此,在上述各实施例的译员基因的选取方法中的描述和定义,可以用于本公开实施例中各个执行模块的理解,具体可参 考上述实施例,此处不在赘述。
根据本公开本方面实施例的一个实施例,译员基因的选取装置的结构如图4所示,为本公开实施例提供的译员基因的选取装置的结构示意图,该装置可以用于对上述各方法实施例中译员基因的选取,该装置包括:初始基因选取模块401、第一计算模块402、第二计算模块403和最终基因选取模块404。
其中,初始基因选取模块401用于从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;第一计算模块402用于对于每一个译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;第二计算模块403用于基于所有译员基因组分别对应的均值和每一个译员基因组对应的标准差,计算该译员基因组对应的Z值;最终基因选取模块404用于基于每一个译员基因组对应的Z值,从所有译员基因组中选取满足设定条件的译员基因组,并将该满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;其中,所述Z值表示大样本差异性验证中Z值。
具体而言,初始基因选取模块301可以根据预先建立的备选译员基因列表,分别选取多组译员基因,并分别以每一组译员基因构成一个基因组,作为译员基因组,该译员基因组为初选出的译员基因组。例如,在进行各组译员基因的选择时,初始基因选取模块301可以从备选译员基因列表中随机抽选表中的多个译员基因,并利用这些随机抽取的译员基因构成一个基因组,即为译员基因组。
之后,对于每一组初选出的译员基因组而言,需要确定其与稿件的匹配效果,从而选择更适于基因匹配的译员基因。同时,为了不失一般性,对于每一组译员基因组,第一计算模块302可以通过将该译员基因组输入给定匹配模型,利用给定匹配模型进行多次匹配结果采样,每次采样均可获取一个匹配成功率样本。可以理解的是,每一个匹配成功率样本,实际上是一次匹配结果采样得到的匹配成功率数值。
另外,对于每一个初选出的译员基因组而言,第一计算模块302根据上述多次匹配结果采样获取的多个匹配成功率样本,计算该译员基因组 的综合匹配成功率,即分别计算该译员基因组对应的匹配成功率的均值和标准差。
之后,第二计算模块403对于每一个初选出的译员基因组,计算其Z值。具体而言,对于每一个译员基因组,根据其所对应的匹配成功率的标准差以及所有译员基因组分别对应匹配成功率的均值,计算其对应的Z值。
最后,最终基因选取模块404根据上述计算得到的Z值可以判断各对应译员基因组在进行基因匹配时的差异性性能。因此,根据每个译员基因组对应的Z值,最终基因选取模块404可以利用预先设定的设定条件,判断该Z值对应的译员基因组是否满足设定的差异性要求。如果不满足,则将其从初选出的各译员基因组中剔除,最终剩余没有被剔除的所有译员基因组即为符合要求的译员基因组。最终基因选取模块404再将剩余的所有译员基因组中的基因取出,并在去除这些基因中的重复基因后,形成新的一组基因,即作为最终选取的译员基因。
进一步的,在上述实施例的基础上,本公开实施例的装置还包括备选译员基因列表构建模块,用于:分别从译员的所有基础信息、能力信息、信用信息和经验信息中提取相应的基因,并对应形成译员的基础信息基因、能力信息基因、信用信息基因和经验信息基因;基于所述基础信息基因、能力信息基因、信用信息基因和经验信息基因,构成所述备选译员基因列表。
其中可选的,备选译员基因列表构建模块具体用于:获取译员的所有基础信息、能力信息、信用信息和经验信息,并分别从基础信息、能力信息、信用信息和经验信息中获取译员特征;基于译员特征,提取译员的译员直接基因。
其中可选的,第二计算模块具体用于:基于所有译员基因组分别对应的均值,计算所有译员基因组的匹配成功率的统一均值;基于每一个译员基因组对应的标准差和均值,以及所有译员基因组对应的统一均值,计算该译员基因组对应的Z值。
其中可选的,第二计算模块具体用于:利用如下公式,计算各译员基因组对应的Z值:
Figure PCTCN2018124951-appb-000007
式中,Z i表示第i个译员基因组对应的Z值,n表示每个译员基因组对应的匹配成功率样本的个数,E i(p)表示第i个译员基因组对应的均值,μ表示所有译员基因组对应的统一均值,S i表示第i个译员基因组对应的标准差。
其中可选的,第一计算模块具体用于:对于任一轮多次匹配结果采样,执行如下处理流程:对所有译员基因组的匹配成功率的初始值进行初始设定;从所有译员基因组中随机选取一个译员基因组,对选取的该译员基因组进行匹配试验,并基于对该译员基因组本次匹配试验的匹配成功率结果与历史匹配成功率结果,更新该译员基因组当前的匹配成功率值;重复执行随机选取至更新的步骤,直至对任一译员基因组的匹配试验的次数达到第一设定阈值,停止对该译员基因组的匹配试验,并记录该译员基因组当前的匹配成功率值;对停止匹配试验的译员基因组以外的译员基因组,重复执行随机选取至记录的步骤,直至对所有译员基因组的匹配试验的总次数达到第二设定阈值,则记录每个译员基因组当前的匹配成功率值,并结束本轮多次匹配结果采样,进入下一轮多次匹配结果采样,直至执行多次匹配结果采样的总轮数达到第三设定阈值,获取每个译员基因组的数量为第三设定阈值的匹配成功率样本。
其中可选的,第一计算模块具体用于,对于每一个译员基因组,提取的匹配成功率样本的个数不少于设定阈值。
可以理解的是,本公开实施例中可以通过硬件处理器(hardware processor)来实现上述各实施例的装置中的各相关程序模块。并且,本公开实施例各译员基因的选取装置在用于对上述各方法实施例中译员基因的选取时,产生的有益效果与对应的上述各方法实施例相同,可以参考上述各方法实施例,此处不再赘述。
作为本公开实施例的又一个方面,本实施例根据上述各实施例提供一种电子设备,参考图5,为本公开实施例提供的电子设备的实体结构示意图,包括:至少一个存储器501、至少一个处理器502、通信接口503和总线504。
其中,存储器501、处理器502和通信接口503通过总线504完成相互间的通信,通信接口503用于该电子设备与译员信息设备之间的信息传输;存储器501中存储有可在处理器502上运行的计算机程序,处理器502执行该计算机程序时,实现如上述各实施例所述的译员基因的选取方法。
可以理解为,该电子设备中至少包含存储器501、处理器502、通信接口503和总线504,且存储器501、处理器502和通信接口503通过总线504形成相互间的通信连接,并可完成相互间的通信,如处理器502从存储器501中读取译员基因的选取方法的程序指令等。另外,通信接口503还可以实现该电子设备与译员信息设备之间的通信连接,并可完成相互间信息传输,如通过通信接口503实现对译员基因的选取等。
电子设备运行时,处理器502调用存储器501中的程序指令,以执行上述各方法实施例所提供的方法,例如包括:从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;对于每一个译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;基于所有译员基因组分别对应的均值和每一个译员基因组对应的标准差,计算该译员基因组对应的Z值;基于每一个译员基因组对应的Z值,从所有译员基因组中选取满足设定条件的译员基因组,并将该满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;其中,所述Z值表示大样本差异性验证中Z值等。
上述的存储器501中的程序指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。或者,实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本公开实施例还根据上述各实施例提供一种非暂态计算机可读存储 介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令使计算机执行如上述各实施例所述的译员基因的选取方法,例如包括:从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;对于每一个译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;基于所有译员基因组分别对应的均值和每一个译员基因组对应的标准差,计算该译员基因组对应的Z值;基于每一个译员基因组对应的Z值,从所有译员基因组中选取满足设定条件的译员基因组,并将该满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;其中,所述Z值表示大样本差异性验证中Z值等。
本公开实施例提供的电子设备和非暂态计算机可读存储介质,通过执行上述各实施例所述的译员基因的选取方法,预先从所有译员的译员基因池中选取多组译员基因组,并通过计算这些译员基因组所对应的Z值,来选取Z值满足设定条件的译员基因组,以作为最终的选取结果,使得选取出的译员基因能够更好的体现译员间的差异性。另外,在基因匹配应用中,能够使据此选取的译员与待翻译稿件进行更合理的匹配,从而有效提高翻译效率和翻译准确率。
可以理解的是,以上所描述的装置、电子设备及存储介质的实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,既可以位于一个地方,或者也可以分布到不同网络单元上。可以根据实际需要选择其中的部分或全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上实施方式的描述,本领域的技术人员可以清楚地了解,各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令,用以使得一台计算机设备(如个人计算机,服务器,或者网络设备等)执行上述各方法实施例或者方法实施例的某些 部分所述的方法。
另外,本领域内的技术人员应当理解的是,在本公开实施例的申请文件中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本公开实施例的说明书中,说明了大量具体细节。然而应当理解的是,本公开实施例的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。类似地,应当理解,为了精简本公开实施例公开并帮助理解各个发明方面中的一个或多个,在上面对本公开实施例的示例性实施例的描述中,本公开实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。
然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本公开实施例要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本公开实施例的单独实施例。
最后应说明的是:以上实施例仅用以说明本公开实施例的技术方案,而非对其限制;尽管参照前述实施例对本公开实施例进行了详细的说明,本领域的技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开实施例各实施例技术方案的精神和范围。

Claims (9)

  1. 一种译员基因的选取方法,其特征在于,包括:
    从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;
    对于每一个所述译员基因组,进行多次匹配结果采样,获取多个匹配成功率样本,并基于所述多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;
    基于所有所述译员基因组分别对应的所述均值和每一个所述译员基因组对应的所述标准差,计算该译员基因组对应的Z值;
    基于每一个所述译员基因组对应的所述Z值,从所有所述译员基因组中选取满足设定条件的译员基因组,并将所述满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;
    其中,所述Z值表示大样本差异性验证中Z值。
  2. 根据权利要求1所述的方法,其特征在于,在所述从备选译员基因列表中,分别选取多组不同的基因的步骤之前,还包括:
    分别从译员的所有基础信息、能力信息、信用信息和经验信息中提取相应的基因,并对应形成译员的基础信息基因、能力信息基因、信用信息基因和经验信息基因;
    基于所述基础信息基因、能力信息基因、信用信息基因和经验信息基因,构成所述备选译员基因列表。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所有所述译员基因组分别对应的所述均值和每一个所述译员基因组对应的所述标准差,计算该译员基因组对应的Z值的步骤进一步包括:
    基于所有所述译员基因组分别对应的所述均值,计算所有所述译员基因组的匹配成功率的统一均值;
    基于每一个所述译员基因组对应的所述标准差和所述均值,以及所有所述译员基因组对应的所述统一均值,计算该译员基因组对应的所述Z值。
  4. 根据权利要求3所述的方法,其特征在于,所述计算该译员基因组对应的所述Z值的步骤进一步包括:
    利用如下公式,计算各所述译员基因组对应的所述Z值:
    Figure PCTCN2018124951-appb-100001
    式中,Z i表示第i个译员基因组对应的所述Z值,n表示每个译员基因组对应的所述匹配成功率样本的个数,E i(p)表示第i个译员基因组对应的所述均值,μ表示所有所述译员基因组对应的所述统一均值,S i表示第i个译员基因组对应的所述标准差。
  5. 根据权利要求1所述的方法,其特征在于,所述进行多次匹配结果采样,获取多个匹配成功率样本的步骤进一步包括:
    对于任一轮所述多次匹配结果采样,执行如下处理流程:
    对所有所述译员基因组的匹配成功率的初始值进行初始设定;
    从所有所述译员基因组中随机选取一个所述译员基因组,对选取的该译员基因组进行匹配试验,并基于对该译员基因组本次匹配试验的匹配成功率结果与历史匹配成功率结果,更新该译员基因组当前的匹配成功率值;
    重复执行所述随机选取至所述更新的步骤,直至对任一所述译员基因组的匹配试验的次数达到第一设定阈值,停止对该译员基因组的匹配试验,并记录该译员基因组当前的匹配成功率值;
    对停止匹配试验的译员基因组以外的译员基因组,重复执行所述随机选取至所述记录的步骤,直至对所有所述译员基因组的匹配试验的总次数达到第二设定阈值,则记录每个所述译员基因组当前的匹配成功率值,并结束本轮所述多次匹配结果采样,进入下一轮所述多次匹配结果采样,直至执行所述多次匹配结果采样的总轮数达到第三设定阈值,获取每个所述译员基因组的数量为第三设定阈值的所述匹配成功率样本。
  6. 根据权利要求1所述的方法,其特征在于,对于每一个所述译员基因组,提取的所述匹配成功率样本的个数不少于设定阈值。
  7. 一种译员基因的选取装置,其特征在于,包括:
    初始基因选取模块,用于从备选译员基因列表中,分别选取多组不同的基因,构成多个译员基因组;
    第一计算模块,用于对于每一个所述译员基因组,进行多次匹配结 果采样,获取多个匹配成功率样本,并基于所述多个匹配成功率样本,计算该译员基因组对应的匹配成功率的均值和标准差;
    第二计算模块,用于基于所有所述译员基因组分别对应的所述均值和每一个所述译员基因组对应的所述标准差,计算该译员基因组对应的Z值;
    最终基因选取模块,用于基于每一个所述译员基因组对应的所述Z值,从所有所述译员基因组中选取满足设定条件的译员基因组,并将所述满足设定条件的译员基因组中的基因合并,获取最终选取的译员基因;
    其中,所述Z值表示大样本差异性验证中Z值。
  8. 一种电子设备,其特征在于,包括:至少一个存储器、至少一个处理器、通信接口和总线;
    所述存储器、所述处理器和所述通信接口通过所述总线完成相互间的通信,所述通信接口用于所述电子设备与译员信息设备之间的信息传输;
    所述存储器中存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1至6中任一所述的方法。
  9. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如权利要求1至6中任一所述的方法。
PCT/CN2018/124951 2018-09-19 2018-12-28 译员基因的选取方法、装置与电子设备 WO2020057003A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811095799.1 2018-09-19
CN201811095799.1A CN109299737B (zh) 2018-09-19 2018-09-19 译员基因的选取方法、装置与电子设备

Publications (1)

Publication Number Publication Date
WO2020057003A1 true WO2020057003A1 (zh) 2020-03-26

Family

ID=65163510

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2018/124951 WO2020057003A1 (zh) 2018-09-19 2018-12-28 译员基因的选取方法、装置与电子设备
PCT/CN2018/124891 WO2020057001A1 (zh) 2018-09-19 2018-12-28 机器翻译引擎推荐方法及装置

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124891 WO2020057001A1 (zh) 2018-09-19 2018-12-28 机器翻译引擎推荐方法及装置

Country Status (2)

Country Link
CN (1) CN109299737B (zh)
WO (2) WO2020057003A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140136181A1 (en) * 2012-11-14 2014-05-15 International Business Machines Corporation Translation Decomposition and Execution
CN104537009A (zh) * 2014-12-17 2015-04-22 语联网(武汉)信息技术有限公司 译员推荐方法及装置
CN105138521A (zh) * 2015-08-27 2015-12-09 武汉传神信息技术有限公司 一种翻译行业风险项目通用推荐译员方法
CN105279147A (zh) * 2015-09-29 2016-01-27 武汉传神信息技术有限公司 一种译员稿件快速匹配方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100793990B1 (ko) * 2006-09-18 2008-01-16 삼성전자주식회사 타일 기반 3차원 렌더링에서의 조기 z 테스트 방법 및시스템
CN102103612A (zh) * 2009-12-22 2011-06-22 北大方正集团有限公司 一种信息提取方法及装置
US8775155B2 (en) * 2010-10-25 2014-07-08 Xerox Corporation Machine translation using overlapping biphrase alignments and sampling
CN103064970B (zh) * 2012-12-31 2016-04-20 武汉传神信息技术有限公司 优化译员的检索方法
CN103092827B (zh) * 2012-12-31 2016-08-17 武汉传神信息技术有限公司 多策略译员稿件自动匹配的方法
CN103729349A (zh) * 2013-12-23 2014-04-16 武汉传神信息技术有限公司 一种对翻译质量影响因素的分析方法
US10067936B2 (en) * 2014-12-30 2018-09-04 Facebook, Inc. Machine translation output reranking
CN106776583A (zh) * 2015-11-24 2017-05-31 株式会社Ntt都科摩 机器翻译评价方法和设备及机器翻译方法和设备
CN106021239B (zh) * 2016-04-29 2018-10-26 北京创鑫旅程网络技术有限公司 一种翻译质量实时评价方法
CN106844303A (zh) * 2016-12-23 2017-06-13 语联网(武汉)信息技术有限公司 一种基于相似度匹配算法为待译稿件匹配译员的方法
CN106844304A (zh) * 2016-12-26 2017-06-13 语联网(武汉)信息技术有限公司 一种基于译稿分类为待译稿件匹配译员的方法
CN108538284A (zh) * 2017-03-06 2018-09-14 北京搜狗科技发展有限公司 同声翻译结果的展现方法及装置、同声翻译方法及装置
CN107016131A (zh) * 2017-05-19 2017-08-04 北方工业大学 基于增强聚类的机器学习算法及该算法的应用
CN107357783B (zh) * 2017-07-04 2020-06-12 桂林电子科技大学 一种中文翻译成英文的英语译文质量分析方法
CN107480147A (zh) * 2017-08-15 2017-12-15 中译语通科技(北京)有限公司 一种对比评价机器翻译系统的方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140136181A1 (en) * 2012-11-14 2014-05-15 International Business Machines Corporation Translation Decomposition and Execution
CN104537009A (zh) * 2014-12-17 2015-04-22 语联网(武汉)信息技术有限公司 译员推荐方法及装置
CN105138521A (zh) * 2015-08-27 2015-12-09 武汉传神信息技术有限公司 一种翻译行业风险项目通用推荐译员方法
CN105279147A (zh) * 2015-09-29 2016-01-27 武汉传神信息技术有限公司 一种译员稿件快速匹配方法

Also Published As

Publication number Publication date
CN109299737A (zh) 2019-02-01
CN109299737B (zh) 2021-10-26
WO2020057001A1 (zh) 2020-03-26

Similar Documents

Publication Publication Date Title
WO2021114810A1 (zh) 基于图结构的公文推荐方法、装置、计算机设备及介质
WO2015135321A1 (zh) 基于金融数据的社会关系挖掘的方法及装置
WO2022007321A1 (zh) 纵向联邦建模优化方法、装置、设备及可读存储介质
CN105824813B (zh) 一种挖掘核心用户的方法及装置
CN108920947A (zh) 一种基于日志图建模的异常检测方法和装置
CN107402859B (zh) 软件功能验证系统及其验证方法
CN110135681A (zh) 风险用户识别方法、装置、可读存储介质及终端设备
US20230351153A1 (en) Knowledge graph reasoning model, system, and reasoning method based on bayesian few-shot learning
CN110598109A (zh) 一种信息推荐方法、装置、设备及存储介质
WO2020063524A1 (zh) 一种法律文书的确定方法及系统
WO2023123933A1 (zh) 用户的类型信息的确定方法、设备及存储介质
CN107885716A (zh) 文本识别方法及装置
CN114428748B (zh) 一种用于真实业务场景的模拟测试方法及系统
JP2023536773A (ja) テキスト品質評価モデルのトレーニング方法及びテキスト品質の決定方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN112307048A (zh) 语义匹配模型训练方法、匹配方法、装置、设备及存储介质
CN111046177A (zh) 一种仲裁案件自动预判方法及装置
CN111079433A (zh) 一种事件抽取方法、装置及电子设备
CN114254617A (zh) 一种修订条款的方法、装置、计算设备及存储介质
US9886498B2 (en) Title standardization
WO2020057003A1 (zh) 译员基因的选取方法、装置与电子设备
CN109582802B (zh) 一种实体嵌入方法、装置、介质及设备
CN111061924A (zh) 词组提取方法、装置、设备和存储介质
CN113946651B (zh) 维修知识推荐方法、装置、电子设备、介质及产品
CN109448792B (zh) 译员基因的选取方法、装置与电子设备
CN112686339B (zh) 一种基于起诉状的案由确定方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934151

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18934151

Country of ref document: EP

Kind code of ref document: A1