CN102682065A - Semantic entity control using input and output sample - Google Patents

Semantic entity control using input and output sample Download PDF

Info

Publication number
CN102682065A
CN102682065A CN2012100236886A CN201210023688A CN102682065A CN 102682065 A CN102682065 A CN 102682065A CN 2012100236886 A CN2012100236886 A CN 2012100236886A CN 201210023688 A CN201210023688 A CN 201210023688A CN 102682065 A CN102682065 A CN 102682065A
Authority
CN
China
Prior art keywords
input
output
item
parsing
conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100236886A
Other languages
Chinese (zh)
Other versions
CN102682065B (en
Inventor
S·古瓦尼
R·辛格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/020,153 external-priority patent/US8799234B2/en
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of CN102682065A publication Critical patent/CN102682065A/en
Application granted granted Critical
Publication of CN102682065B publication Critical patent/CN102682065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)
  • Machine Translation (AREA)

Abstract

The invention refers to semantic entity control using input and output samples. The invention presents a semantic entity control technology implementation example and the generation of the embodiment can control a probability program of character strings representing a semantic entity. The probability program later can be used generating necessary output consistent with the embodiment from a type of input included in the input and output samples. The probability program is generated based on output of analytical, transformation and formatting modules. The analytical module uses a probabilistic method to analyze the input and output samples. The transformation module can generate a weighted transformation set of output of a probability prescribed by distributed weight from input of an input and output sample. The formatting module generates formatting instructions for converting a selected output part into a form prescribed by the input in the input and output sample.

Description

Use the semantic entity of input-output example to handle
Technical field
Design semantic manipulations of physical of the present invention especially uses the semantic entity of input-output example to handle.
Background technology
Worldwide millions of people uses electrical form to wait and stores and manipulation data.It is another kind of form from a kind of format conversion that these data manipulation scenes generally include a large amount of input informations, perhaps need carry out input information and calculate to produce required output.Usually, manually or use little, normally disposable these tasks that should be used for realizing, these are used by the terminal user or by the programmer is that the terminal user creates.
Summary of the invention
Semantic entity manipulation technology embodiment described herein generates the probability program that can handle the character string of expression semantic entity based on the input-output example.One type the input that this program can be used for from the input-output example, comprising subsequently generates the required output consistent with this example.This permission becomes another kind of form with input information from a kind of format conversion based on the input-output example that the terminal user provided, and on input information, carries out and calculate to produce required output.
Generally speaking, in a realization, can realize foregoing through at first receiving aforesaid input-output example.Each input-output example provides one or more input items and corresponding required output item.The input and output item that is received is resolved to produce the analytical set through weighting.These each expressions in the parsing of weighting are to the different potential parsing of each input and output item, wherein according to being that the tolerance of the possibility of effectively resolving has been carried out weighting to this parsing based on compare this parsing with the parsing storehouse of regulation.Next, for each input-output example, the one or more conversion of sign from one type transformation library, these one or more conversion can generate required output item from the input item of this example.In addition, the sign format instruction, this format instruction can be formatd so that be complementary with the format of the required output item of input-output example output item.When treated whole input-output example; The generating probability program; When the identical one or more input item of input item type given and the input-output example, this probability program adopts the conversion that is identified to produce and the corresponding output item of these one or more input items with the format instruction.Receive the one or more input items identical subsequently, and use the probability program that is generated to produce output item with the input item type of input-output example.
Should be noted that it is some notions that will in following embodiment, further describe for the form introduction of simplifying that content of the present invention is provided.Content of the present invention is not key feature or the essential feature that is intended to identify theme required for protection, is not intended to be used to help to confirm the scope of theme required for protection yet.
Description of drawings
With reference to following description, appended claims and accompanying drawing, will understand concrete characteristic of the present invention, aspect and advantage better, in the accompanying drawing:
Fig. 1 shows and is used to generate the program generating system (PGS) of carrying out the program of data manipulation task based on the input-output example, and with the program execution module of this program in new input item.
Fig. 2 shows the program generating system (PGS) that comprises Fig. 1 and the data manipulation system of program execution module.
Fig. 3 is a kind of process flow diagram of general view of mode of operation that the program generating system (PGS) of Fig. 2 is shown.
Fig. 4 illustrates (Fig. 2's) program generating system (PGS) how to use three part operations to generate the process flow diagram of program.
Fig. 5 and 6 illustrates an example that a kind of mode of operation of the conversion module that uses in the program generating system (PGS) of Fig. 2 is shown jointly.
Fig. 7 is that the general view of the for example a kind of mode of operation through conversion module is shown replenishes the process flow diagram of the example of Fig. 5 and 6.
Fig. 8 illustrates and is used for the conversion process that rounds off that the semantic entity instance is rounded off.
Fig. 9 illustrates the monetary data table, and wherein the terminal user wants to use the currency exchange rate on the date shown in row-3 will be listed as the currency type shown in the currency conversion one-tenth row-2 in-1, thereby obtains the result shown in the output row.First row shows the input-output example that the terminal user provides.
Figure 10 shows the range data table; Wherein depend on some ad-hoc location the attribute of travelling frequently (such as drive the time to the rush hour of user's office, to the walking distance in nearest gymnasium and to the driving distance of nearest university), the terminal user wants from (first row of table, briefly presenting its address) 8 apartments, to make wise selection.
Figure 11 shows the form string list that is used for double-precision value is formatted into various output formats.
Figure 12 shows the form string list that is used for Date-Time 24/9/1986 18:23:05 is formatted into various output formats.
Figure 13 show the date that has three kinds of different-formats electrical form (that is, and U.S.'s form: the moon/day/year, European form: day. month. year, and Chinese form: the year-moon-Ri).Notice that in these dates some lacks the time, it is defaulted as this year by supposition.
Figure 14 is the process flow diagram of roughly setting forth the example process that realizes semantic entity manipulation technology embodiment described herein.
Figure 15 shows a realization of the semantic entity control system of the process that is used to realize Figure 14.
Figure 16 is the diagram of universal computing device of having described to be configured for to realize the example system of semantic entity manipulation technology embodiment described herein.
Embodiment
In following description to semantic entity manipulation technology embodiment, accompanying drawing has been carried out reference, accompanying drawing has constituted the part of embodiment and shown as explanation therein can put into practice this technological specific embodiment therein.Be appreciated that and use other embodiment and can make structural change and do not deviate from the scope of the present invention technology.
In addition, some accompanying drawings are described notion in the context of one or more construction packages (by be called function, module, characteristic, element etc. differently).Each assembly shown in the accompanying drawing can be realized with any way.In one case, be that different unit can be reflected in the actual realization and to use corresponding different assembly with various components apart shown in the accompanying drawing.Can be alternatively, perhaps additionally, any single component shown in the accompanying drawing can be realized through a plurality of actual component.Can be alternatively, perhaps additionally, to any two or more the independent assemblies in the accompanying drawing describe can reflect by the performed different functions of single actual component.
Other accompanying drawings with the process flow diagram formal description notion.With this form, some operation is described to constitute the different frame of carrying out with a certain order.Such realization is illustrative and nonrestrictive.Some frame described herein can be grouped in together and in single operation to be carried out, and some frame can be divided into a plurality of composition frames, and some frame can be by carrying out (comprise with parallel mode and carry out these frames) with the different order that goes out shown here.Each frame shown in the process flow diagram can be realized with any way.
About term, phrase " is configured to " contain any way that the function that can construct any kind of is carried out the operation that is identified.Term " logic " or " logic module " contain any function that is used to execute the task.For example, the operation of each shown in the process flow diagram is all corresponding to the logic module that is used to carry out this operation.When realizing by computing system (for example, " computing function "), logic module represent as the physical piece of computing system, no matter be the physical assemblies how to realize.
Following explanation can be " optional " with one or more signature identifications.This statement should not be interpreted as the detailed explanation that can be regarded as optional characteristic; That is to say that other characteristics also can be regarded as optional, though in text, there is not the explicitly sign.Similarly, explanation can point out that one or more characteristics can realize (that is, through more than one characteristic is provided) with plural number.This statement should not be interpreted as the detailed explanation of recursive characteristic.At last, term " exemplary " or " illustrative " refer to a realization in a plurality of realizations of possibility.
1.0 general view reaches the relation with father's patented claim
In father's application of this part continuity application, the program generating system (PGS) of the program that generates based on a plurality of input-output examples has been described.The input-output example comprises input item and corresponding output item.In a realization, this program generating system (PGS) comprises three assembly modules.Parsing module is handled input item and output item respectively so that a plurality of importations and output to be provided.Whether conversion module confirms to use one or more conversion modules of from candidate conversion device module collection, selecting for each output can produce this output from the importation of correspondence.Formatting module generates the format instruction that selected output is transformed into the specified form of original output item.These three modules produce the program that is generated of the logic that realizes learning from example from input-output; The program that this generated can be used for new input item is transformed into new corresponding output item subsequently.
In order to understand semantic entity manipulation technology embodiment described herein better, at first will present review, and will present modification subsequently the expression semantic entity manipulation technology embodiment of this system to the system of describing in father's application.
1.1 father's patented claim system
Fig. 1 shows the illustrative program generating system (PGS) 102 that comes creation procedure based on the input-output example.Each input-output example comprises an input item and corresponding output item.Input item can comprise the one or more ingredients that are called as the importation here.Output item also can comprise the one or more ingredients that are called as output.
Each output item is represented certain type conversion to the input item execution of correspondence.For example, output item can comprise the one or more outputs that directly duplicate of expression to one or more corresponding importations of obtaining from input item.In addition, or alternatively, output item can comprise the one or more outputs of expression to the conversion of one or more corresponding importations (non-directly duplicate).In addition, or alternatively, an output item can comprise and be applied to its format of contentization that this format is different from the format that is applied to corresponding input item.In addition, or alternatively, an output item can comprise the one or more outputs with the relative part in the corresponding input item.In addition, or alternatively, output item need not comprise the relative part of each importation in the corresponding input item.
For example, Fig. 1 has presented the illustrative input-output example set 104 of data file 106.Input-output example set 104 comprises a plurality of input items 108 and corresponding a plurality of output items 110.In this example, input item comprises single-row alphanumeric input information; Equally, output item 110 comprises single-row alphanumeric output information.Yet in other situations, data file 106 can comprise and is mapped to two row or the single-row input informations of multiple row output information more.In another situation, data file 106 can comprise two row or the multiple row input informations more that are mapped to single-row output information.In another situation, data file 106 can comprise and be mapped to two row or two row or multiple row input informations more of multiple row output information more, by that analogy.In addition, data file 106 can be organized the set (that is, as the replacement that is listed as tissue or additional) of input information and output information according to any way.More generally, example shown in Figure 1 can change according to a lot of different modes.
In the special scenes of Fig. 1, input item is represented the invoice of unprocessed form.Output item is represented the invoice through shifted version of output format.For example, first input item comprises the date (" 2-2-15 ") of specific format, representes the date on February 2nd, 2015.Output item is transformed into another kind of form with this date, promptly through printing the abbreviation (that is, " Feb. (February) ") of month name.In addition, first letter that output item will be abridged in month is transformed into small letter from capitalization, thereby produces " feb. ".First input item also comprises the title in city, i.e. " Denver (Denver) ".Output item is transformed into the corresponding state information through abbreviation, i.e. " CO (Colorado) " with this urban information.First input item also comprises the value at cost in dollar, i.e. " 152.02 ".Output item repeats this value at cost, but this value is rounded to the immediate dollar of amount of money, thereby has produced " 152 ".First input item also comprises string " Paid (paid) ".Output item repeats this string with mode word for word.
In addition, notice that (first input-output example) output item comprises non-existent additional information in the corresponding input item.For example, output item comprises three commas, and input item only comprises single comma.In addition, output item is added dollar mark () " $ " in the front of cost numeral " 152 ".In addition, compare with corresponding input item, output item is come arrangement information in a different manner.For example, input item is placed on cost information (" 152.02 ") before with positional information (" Denver "), and output item is placed on position information (" CO ") before with cost information (" 152 ").At last, output item presents last string (" Paid ") with black matrix, and in input item, presents it without black matrix.As it is understandable that, only present this specific example for illustrative purposes.Other input-output examples can be different from this scene according to any way.
Data file 106 also comprises the input item 112 of another group of the output item that does not still have correspondence without conversion.For small data set, the user can study input-output example set 104 so that find to be used to input item is transformed into the logic of corresponding output item.The user can manually generate new output item for the one group of new input item 112 that meets this logic subsequently.Yet this manual operation becomes unrealistic along with the increase of the size of data file 106.
In order to address this problem, program generating system (PGS) 102 automatically generates and helps the user one group of input item 112 to be transformed into the program 114 of required output form.From high-level view, program generating system (PGS) 102 generates program 114 based on input-output example set 104.Program execution module 116 is applied to one group of new input item 112 with program 114 subsequently.This has produced one group of new output item.For example, program 114 automatically is transformed into new input item " 4-19-15 Yakima 292.88, Paid " " apr 2015 , $293, WA, Paid ".
Fig. 2 shows a declarative data control system 200 of program generating system (PGS) 102 and the program execution module 116 of Fig. 1 capable of using.Usually, Fig. 2 has divided different module clearly to identify the performed function of these corresponding modules.In a kind of situation, these modules can be represented different physical assemblies.In other situations, one or more modules can be represented the assembly in one or more other modules.
From high-level view, program generating system (PGS) 102 combines the data manipulation function 202 of any kind to operate.Data manipulation function 202 expressions are used for data item is carried out any instrument of handling.In a kind of situation, data manipulation function 202 can provide the user interface that allows customer inspection and revise data item.For example, in a kind of situation, data manipulation function 202 can represent to allow the spreadsheet systems of user with form manipulation data item.A spreadsheet systems that can be used is to be arranged in the Microsoft Office
Figure BDA0000133670330000072
that company provided of Microsoft
Figure BDA0000133670330000071
that State of Washington Randt covers the city in another situation; Data manipulation function 202 can be represented the table operating function in the documents editing application, or the like.
Data manipulation function 202 can be mutual with other functions 204.For example, data manipulation function 202 can perhaps be sent data item to other functions 204 from other function 204 receiving data items.Other functions 204 can be represented the application module (such as documents editing application, spreadsheet application etc.) of any kind.Alternatively, perhaps in addition, other functions 204 can be represented the entity of the network-accessible of any kind.For example, other functions 204 can be represented the collection of data items safeguarded in can the remote data storage via access to the Internet.
In operation, the user can provide the input-output example collection to data manipulation function 202.For example, in a situation, the user can manually create the input-output example set.In another situation, the user can instruct data manipulation function 202 in the data file that comprises the input-output example, to read.Can obtain data file from any source, such as other functions 204 that can represent (with respect to data manipulation function 202) local source and/or remote source.After guidance, but data manipulation function 202 service routine generation systems 102 provide program 114.This program 114 has been expressed the logic that in the input-output example, embodies.Program execution module 116 service routine 114 subsequently is located in the new input item of reason automatically so that generate new output item.
Fig. 2 illustrates with program generating system (PGS) 102 data manipulation function 202 as two different corresponding modules.In another situation, data manipulation function 202 can consolidation procedure generation system 102 as one of its assembly, vice versa.Equally, Fig. 2 illustrates program execution module 116 as an assembly in the data manipulation function 202.In another situation, data manipulation function 202 can be represented two different module with program execution module 116.
Data manipulation function 202 can be in different patterns the calling program generation system.In a kind of pattern, the user can for example wait the function of explicitly calling program generation system 102 through activation command button, menu item in the user interface demonstration that is provided in data manipulation function 202.User's explicitly subsequently identifies the input-output example set for use in generator program 114.
In another pattern, data manipulation function 202 can comprise measuring ability, and this measuring ability detects user's conversion that same type is repeatedly being carried out in set to input item so that corresponding output item to be provided.Data manipulation function 202 can be come calling program generation system 102 automatically based on the already provided input-output example of user subsequently.
These use patterns are representational, but not exhaustive.Data manipulation function 202 can be mutual with program generating system (PGS) 102 in other operator schemes.
The user can be directly or indirectly calling program generation system 102 to realize different data manipulation targets.In first scene, will be that the user can calling program generation system 102 under the situation of demand of some environment special use of the information expressed with second form with the information translation that first form is expressed existing.For example, in a situation, the user can receive the information of first form from another person (or a plurality of people).Based on the special-purpose consideration of any environment, the user possibly hope this information conversion is become more receptible second form of this user.In another situation, user itself possibly create this information with first form.The user possibly hope this information conversion is become second form now.In another situation, the user can use from the source, data storage etc. receives the information of expressing with first form.The user possibly hope this information translation one-tenth is more suitable for second form of intended application, data storage etc.For example, the user possibly hope that this information is used employed format conversion from documents editing becomes the employed form of spreadsheet application, and vice versa.In another situation, the user hopes and will become the non-marked language format with the information translation that markup language (for example, XML, HTML etc.) is expressed, or the like.These examples as an example and unrestricted appearing.
In second scene, from extracting the fundamental purpose of one or more data item from (obtaining from any source) input item, the user can be directly or calling program generation system 102 indirectly.In this scene, second form is represented the subclass with the information of first form expression.
In the 3rd scene, the user can be based on the combination of the reason that is associated with first scene and second scene, directly or indirectly calling program generation system 102.For example, except that from the input item information extraction, the user possibly hope the conversion to the information and executing any kind that is extracted.The user can also add the information that does not have relative part in the input item to output item.
The described data manipulation scene of preceding text is representational, and non-exhaustive.The user can calling program generation system 102 to realize other data manipulation target.
About physics realization, each module shown in Fig. 2 and system can be realized by one or more computing equipments.These computing equipments can be positioned at single position maybe can be distributed in a plurality of positions.For example, local data operating function 202 can be mutual so that the function of being summed up above carrying out with local program generation system 102.In another situation, local data operating function 202 can be mutual so that realize function described herein with the program generating system (PGS) 102 that telecommunication network is realized.In addition, module of each shown in Fig. 2 and system can be managed by single entity or a plurality of entity.
The computing equipment of any kind can be used for realizing the function described in Fig. 2, those that mention in the exemplary operation environment division such as this instructions.
Program generating system (PGS) 102 and data manipulation function 202 can also be mutual with one or more data storage 206.For example, data storage 206 can be stored input-output example etc.
Introduction above having had explains that the illustrative that advances to program generating system (PGS) 102 now is synthetic.Program generating system (PGS) 102 comprises (maybe can be conceptualized as and comprise) module collection.This part provides the general view to these modules.The various piece of back provides the additional detail of relevant each module.As general view, program generating system (PGS) 102 can convert the input-output example into program 114 with three part processes: parsing module 208 is carried out first; Conversion module 210 is carried out second portion; And formatting module 212 is carried out third part.
More specifically, the various piece of parsing module 208 sign input items.Parsing module 208 can also identify the various piece of output item.Can conversion module 210 be confirmed use one or more conversion modules to calculate each output that parsing module 208 is identified.Conversion module is carried out this task through search in data storage 220.Data storage 220 provides the candidate conversion module collection.Usually, each candidate conversion module is transformed into output information based at least one pre-defined rule with input information.Formatting module 212 provides the format instruction that selected output is transformed into the specified form of original output item.For example, formatting module 212 can be arranged output according to the order that is complementary with the specified form of output item.In addition, formatting module 212 can print constant information with output item in the fixed information that appears mate.
Program generating system (PGS) 102 can be exported the program 114 that is generated, and this program 114 has reflected the processing that parsing module 208, conversion module 210 and formatting module 212 are performed.The program 114 that is generated can be used for based on the logic that is embodied in the input-output example collection new input item being transformed into new output item.In a situation, program generating system (PGS) 102 is expressed as the program that is generated 114 set of the program module that will call with certain order.One or more program modules can be represented the instantiation of the conversion module that conversion module 210 is identified.One or more other program modules can be through extracting the content in the new input item and in the new output item of correspondence, printing the content of being extracted and operate.One or more other program modules can be carried out the format manipulation of the outward appearance that influences output (but needing not to be content) that formatting module 212 identified, or the like.
Fig. 3 shows the process 300 of senior description of the operation of the data manipulation system 200 that presents Fig. 1.At frame 302, data manipulation system 200 receives the input-output example set.Each input-output example comprises data item (comprising one or more input string items) and output item.At frame 304, data manipulation system 200 is come creation procedure 114 based on the input-output example.At frame 306, the new input item that data manipulation system 200 service routines 114 will be added (as yet not by conversion) is transformed into new output item.
Fig. 4 shows the process 400 of the more detailed description of the mode that presents the program 114 that 102 generations are generated to program generating system (PGS).At frame 402, program generating system (PGS) 102 receives the input-output example set.At frame 404, program generating system (PGS) 102 resolves to each input item respectively forms the importation, and each output item resolved to respectively forms output.At frame 406, program generating system (PGS) 102 signs can convert the importation to the conversion (if any) of corresponding output.These conversion are carried out by each conversion module of from candidate conversion device module collection, selecting.At frame 408, program generating system (PGS) 102 generates the format instruction that selected output is transformed into the specified appropriate format of output item.At frame 410, program generating system (PGS) 102 is exported the program 114 that is generated on the basis of the performed analysis of frame 404-408.
2.0 semantic entity is handled
Semantic entity manipulation technology embodiment described herein revises the mode of aforesaid parsing, conversion and formatting module operation and the mode that program generating system (PGS) produces the program that is generated usually.More specifically, semantic entity manipulation technology embodiment described herein has introduced the probabilistic method that generates about parsing, conversion, format and program.
Usually; Introduced the module frame of semantic entity; The notion that this module frame has been expanded type is so that its function is visit easily as far as the terminal user who uses the string interface, and this module frame allows the deviser to define novel entities like a cork and to the operation of these entities.In addition, introduced the design of probability programming scheme so that handle the string of the aforementioned semantic entity of expression.Program in this scheme comprises the parsing-calculating-printing expression formula collection through weighting.The mode that has also presented the consistent probability of use programming scheme of the input-output example set that provides with the user.
Usually, semantic entity just as the wrapper around (using no negative effect method) storehouse class so that its function is visit easily as far as the terminal user who uses the string interface.It has made things convenient for from destructuring and the conversion of regular representation of bottom class of maybe be noisy going here and there, and adopts the regular representation conversion bunchiness of letter quality printer with the bottom class.
Provide input and output interface to have the benefit of permission terminal user to the light visit of class function (comprising online function available or dictionary) based on string.Yet, notice that the input and output example that the terminal user provides can show potential ambiguity (being a plurality of explanations), noise (being data error) and the information (such as the acquiescence of implicit expression) of losing.In order to solve this situation, adopt to require entity design person to specify the general probabilistic solutions parser framework of the measuring similarity between maybe hard and soft-constraint and any string and the effective word segment value of field value.
For the string list of handling semantic entity shows, adopt aforesaid probability programming scheme, its program is made up of the parsing-calculating-printing expression formula collection (being also referred to as the pcp expression formula) through weighting.Analytical expression is positioned at the end of blade of pcp expression formula, and allows to show from string list and convert certain regular representation (being certain normalized intermediate representation) into.Print the top layer that expression formula is positioned at expression formula, and allow to convert required string list into and show from regular representation.Even because input string also can have multiple parsing under given analytical expression, so each pcp expression formula is returned one group of output string through weighting to the semantic interpretation of input string.Have (through weighting) this pcp expression formula collection and serve two purposes.The first, it allows condition control stream is encoded.In addition and since these expression formulas from faulty information learning to, so its allows the expression to the probability distribution of pcp expression formula.Each program also comprises the parsing descriptor set through weighting.This set has been caught the statistics that can from the corpus of example, make up and has been resolved knowledge.This set is used for further refinement possibly resolve this given string before the pcp expression formula is carried out given string.
Semantic entity manipulation technology embodiment described herein also makes the terminal user can use web to go up available data and calculates.This is important, solves the most of computational problem in their daily life because research shows most of users through using web to go up available data or service.Semantic entity manipulation technology embodiment described herein provides multiple mode to launch web is gone up the available data and the seamless access of service.For example, in a realization, the interface of computing module is not supposed; And in fact; These computing modules can be the API Calls through the Internet, such as so that the rate of conversion between two kinds of currency on calculated for given date, perhaps so that calculate two driving distances between the address.In addition, in a realization, the given field of a set of the effective value of an entity or an entity can be constrained to by certain data, services provider and belong to upward available set of web.This is huge and can't be stored locally or be useful can't upgrade constantly in time the time in this set.
The aforementioned characteristic of semantic entity manipulation technology embodiment will be described in subsequent section now in more detail.
2.1 semantic entity
Semantic entity (or simply, entity) be that the wrapper around the class of storehouse is so that its function is visit easily as far as the terminal user who uses the string interface.In a realization, an entity is associated with bottom class, set of fields, constraint and ToCanonical and FromCanonical method.Can from the database of this entity, obtain these semantic entities.In addition, the developer can generate new entity, and directly provides them perhaps it to be stored in the semantic entity database.
Bottom semantic entity class has and has one group of unexposed standard interface that does not have the method for negative effect.The semantic entity field has (typed) expression, and wherein each type can be another entity or base type.Notice that the term " field " that uses in this instructions can also refer to some expression of actual field.The semantic entity constraint comprises hard and soft-constraint.Soft-constraint is the constraint set through weighting of specifying order possible between effective field and field or the delimiter.It is to have the soft-constraint that weight equals 1 that hard constraint is defined as.Can also comprise that the boolean who can be used as entity instance (comprising the distribution to some subclass of field) and parsing format descriptor (field title and delimiter sequence) checks any constraint set of realizing.In a realization, use following predicate to express constraint:
1) (f, S): field f is from finite set S value for InSet.Range (f, i, j) be InSet (f, special case S), S={i wherein ..., j}.
2) FieldOrder (f, f '): field f ' follows after f in entity parsing format descriptor.
3) (f, d): delimiter d follows after field f DelAfter.
4) (f, d): delimiter d appears at before the field f DelBefore.
For example; The semantic entity of supposing the class of " DateTime (time on date) " by name has following sets of fields: " Month (moon) ", " Day (day) ", " Year (year) "; " Hours (time) "; " Minutes (branch) ", " Seconds (second) " and " AM-PM (morning-afternoon) ", and " Month " field has the following expression that has corresponding hard constraint:
Month Num: this expression from the set 1,2 ..., 12}, i.e. Range (Month Num, 1,12) be worth in middle round numbers month; And
Month Words: this expression from the set January, February ..., December}, i.e. InSet (Month Words, January ..., get string month value in December}).
Also hypothesis " DateTime " is associated with following soft-constraint.That is, " Minutes " field, i.e. (FieldOrder (Hours, Minutes)) are more likely followed in " Hours " field back.In addition, " AM-PM " field front more likely is " Seconds ", " Minutes " or " Hours " field, promptly FieldOrder (Seconds, AM-PM), FieldOrder (Minutes, AM-PM) and FieldOrder (Hours, AM-PM).
Other example of some of soft-constraint comprises the online dictionary of the address of surname, name and Address (address) entity that is used to mate Name (name) entity.
Aforesaid ToCanonical and FromCanonical method are used for respectively entity instance being converted into regular representation and converting entity instance into from regular representation.More specifically, ToCanonical method: e → e CBe used for entity instance is converted into the regular representation of entity.FromCanonical method: e CXF → e carries out the inverse operation that the entity instance of regular representation form is converted into given sets of fields F.
For example, suppose that the semantic entity of the class of " Length (length) " by name has following sets of fields: " kms (km) ", " m (rice) ", " cm (centimetre) ", " mm (millimeter) ", " ft (foot) ", " inches (inch) " etc.Also hypothesis " cm " field is represented the regular representation of this entity.The ToCanonical method of " Length " use to the conversion factor of " cm " whole field values are carried out suitably normalization after with its addition.The FromCanonical method is got canonical long measure instance e CAnd sets of fields (f 1..., f n), and at each interfield fractionation entity value e C, make this value be packaged in next higher field.If do not have available higher field to pack, then the extra amount of the highest field store.For example, suppose that the schoolman wants to register the middle school student's of class height with the form of ft-inches (foot-inch), but the height data that obtains from the student is very irregular---promptly with various gauging systems (for example, m, cm, in, m-cm etc.).ToCanonical can convert irregular input into consistent form with the FromCanonical method, such as ft-inches or any other required form.This is through being canonical form with data-switching at first and converting required form subsequently into and accomplish.
In a realization, entities field is by base type but not entity type when representing, this base type with a string is associated as input and the letter quality printer pp set that produces another string.The example of this base type is integer, string, double.In a realization, adopt the format that following letter quality printer scheme is convenient to base type is enriched (and then making it possible to that its field is had the format that the entity of this base type enriches).
1) the Identity:Identity printer is printed string faithfully;
2) the UpperCase:UpperCase printer is printed string with uppercase format;
3) the LowerCase:LowerCase printer is with small letter format print string;
4) the ProperCase:ProperCase printer is with suitable capital and small letter format print string;
5) length of Prefix (k): Prefix (k) printer prints string is the prefix of k;
6) the IntOrd:IntOrd printer is printed the represented integer of string with the ordinal number form, for example 1 is printed as 1 St(the first);
7) IntegerPrecision (d): IntegerPrecision (d) printer is with this integer of format print of integer d bit representation;
8) DoublePrecision (d): DoublePrecision (d) printer prints has the floating number of definite d decimal positional accuracy;
9) DoubleAtMost (d): DoubleAtMost (d) printer prints has the floating number of maximum d decimal positional accuracies; And
10) DoubleAtLeast (d): DoubleAtLeast (d) printer prints has the floating number of d decimal positional accuracy at least.
Resolve the letter quality printer value from one of these printers in the format descriptor.Resolve format descriptor and need learn the additional independent variable parameter of Prefix (k), IntegerPrecision (d) and DoublePrecision (d) letter quality printer.The developer can provide maybe be useful to given entity one group of this argument value.For example, for " Month " field in " DateTime ", the prefix of big or small k=3 possibly be useful usually.
2.2 probability programming scheme
As previously mentioned, in a realization, adopt the probability programming scheme so that handle the string of the aforesaid semantic entity of expression.Now this programming scheme will be described.To in to the description of parsing, manipulation and the printing of semantic entity, adopt the language of this scheme subsequently.
Usually, the program of probability of use programming scheme generation comprises parsing-calculating-printing (pcp) expression formula collection through weighting.The specific characteristic of this scheme is for given input, and the program that is generated produces one group of output through weighting, rather than single output.
2.2.1 Sentence structure
In a realization, the probability programming scheme adopts following sentence structure:
1) program P:={PD, PCP}, wherein PD={ (p 1, w 1) ..., (p k, w k), and PCP={ (O 1, w 1) ..., (O n, w n);
2) resolve descriptor p:=[fd 1..., fd m];
3) field-delimiter is to fd:=(f Ij, constStr);
4) pcp expression formula O:=printEntity E(C, q);
5) calculation expression C:=parseEntity E(s i, p) | T (C 1..., C n); And
6) print format q:=Format ((fd 1, pp 1) ..., (fd n, pp n)).
Program P in this language is defined as through the parsing descriptor set PD of weighting and parsing-calculating-printing expression formula collection PCP.Symbol w representes these weights.
(E of entity) resolves (form) descriptor p is the sequence to fd, wherein each certain field (and expression identifier) and delimiter to comprising E.Convenient for notation, the expression of resolving descriptor usually is shown laid flat in string (for example a, [(f through being connected in series simply in the sequence with internal whole units usually 1Str 1), (f 2Str 2), (f 3Str 3)] be represented as f 1Str 1f 2Str 2f 3Str 3).
Parsing-calculating-printing expression formula the O that also is called as the pcp expression formula is printEntity E(wherein C is that certain calculation expression and q are the print format descriptors of entity E for C, q) form.Calculation expression C has the sentence structure of recurrence.It or will go here and there s iWith the parsing format descriptor p of entity E (basic scenario) parseEntity as input E(s i, p) expression formula, or its independent variable also is the conversion expression formula T of calculation expression (recurrence situation).
Print (form) descriptor q just as resolving descriptor, except each field is associated with letter quality printer.Its type is that the letter quality printer of the field of entity itself is the print format descriptor, and its type be the letter quality printer of the field of base type is one of its letter quality printer that is associated (described like 2.1 joints).
For example, consider to have the electrical form of two input row.First row comprise the date, and secondary series comprises the integer of representing a plurality of business days.The terminal user wants to add the quantity on the business day in the secondary series in first row date, and in the 3rd row, with ad hoc fashion the date of gained is formatd.For example, in first row, comprise that string " 24/09/1986 " comprises string " 3 " in secondary series if electrical form is capable, and user expectation demonstration " Monday, 24 in the 3rd is listed as ThSeptember 1986 " (on September 24th, 1986, Monday), one of pcp expression formula O that then carries out required manipulation in the program can be represented as:
O=printEntity DateTime(addBusinessDays (C 1, C 2), q 3), wherein
C1≡parseEntity DateTime(s 1,p 1)
C2≡parseEntity Duration(s 2,p 2)
p 1≡Day 1/Month 2/Year 1
p 2≡Days 1
q 3≡Format((DayOfWeek 1,″,″),stringP),(Day 1,″″),intSupP),(Month 2,″″),stringP),(Year 1,ε,intP))
Wherein the subscript of entities field title representes that corresponding field representes.String s 1And s 2Expression lays respectively at the string in first and second row.Empty string is represented by ε.
2.2.2 it is semantic
In a realization, semantic below the probability programming scheme adopts:
Figure BDA0000133670330000181
Figure BDA0000133670330000191
In input state σ to the semanteme of program P evaluation (to string variable s iApportioning cost) be exactly to pcp expression formula collection PCP evaluation in input state σ.In input state σ to PCP={ (O 1, w 1) .., (O n, w n) semanteme of evaluation be at first in state σ to each pcp expression formula O jEvaluation is so that obtain one group of output string (o through weighting i, w ' i), and passing through weight w ' subsequently iMultiply by w jCome weight is carried out getting all this union of sets collection after the normalization.
In input state σ to pcp expression formula O jEvaluation comprises that at first use function getAllParses (getting all resolves) calculates the collection of the parse state π through weighting from σ.Here it is use as a program P part through the parsing descriptor set PD of weighting part.To in 2.3 joints, the getAllParses function of resolving string be described.With distribute the state σ of string value different to variable, through the parse state π of weighting distribute to each variable s (regular representation) tuple of comprising entity instance e, resolving descriptor p and weight w (e, p, w).Subsequently each in the π of parse state of weighting to pcp expression formula O jEvaluation is to obtain being used as output through the string of weighting.So, in input state σ to O jThe result of evaluation is exactly the set of all this string through weighting.Notice that this is how probability interpretation to be assigned to the execution to the pcp expression formula of input string that has possible fuzzy parsing.
In the parse state π of weighting to pcp expression formula O j=printEntity E(C, q) evaluation comprises the calculation expression C evaluation among the π, this produce entity instance through weighting (e, w).According to print format descriptor q, convert thus obtained entity instance e into expression e ' from its regular representation.According to print format descriptor q, further convert entity instance e ' into string list and show.
To calculation expression T (C 1..., C n) recurrence situation evaluation comprise its independent variable evaluation recursively.Through with the corresponding canonical entity instance to the result of independent variable evaluation on call conversion T and generate the canonical entity instance.Corresponding weight is calculated as the minimum value of the weight that is associated with the result of evaluation of each independent variable.This expression is following true: when needed each input of calculating that and if only if all had high confidence level, high confidence level just was associated with result calculated.
In the parse state π of weighting to calculation expression parseEntiry E(s i, basic scenario evaluation p) comprises the degree of approach tolerance w ' between computation scheme descriptor p ' and the p, wherein p ' is tuple π (s i) in the parsing descriptor.Weight w ' is used for adjusting the string s with state π subsequently iThe weight w of corresponding entity instance e.And if only if the string s iIn the time of need similarly effectively resolving descriptor with p, this weighting normalization helps to top layer pcp expression formula higher weight to be provided.
Note given P={PD, { (O 1, w 1) .., (O n, w n).Make that σ is the loading routine state.Each (through weighting) parsing π with state σ (in variable tuple) iWith each pcp expression formula (O j, w j) corresponding, it is right through the output string of weighting to generate
Figure BDA0000133670330000201
Amount below and if only if is when high, w ' IjValue be high relatively-with state π in (O jIn use) the corresponding weight of parsing of string, those resolve with at pcp expression formula O jThe parsing descriptor of the correspondence that occurs of end of blade between measuring similarity and w j
Next three joints are stressed three key function aspects of aforesaid probability programming scheme, are about to string and resolve to entity, entity are calculated and entity is printed to string list show.
2.3 illustrative parsing module
This joint is described a realization of aforesaid parsing module, and this parsing module provides aforementioned probability programming scheme proposition string is resolved to the abundant probability support through the analytical set of weighting---comprise fuzzy string (string that has a plurality of parsings) and noisy string (not having the effectively string of parsing).During learning phase, carry out to resolve with resolve in the example the input and output string both, thereby learn potential calculation expression collection.Run time between also input carry out is resolved and carries out for program.
2.3.1 fields match
Parsing starts from the identification field coupling.These will be used to make up analysis diagram, like what will describe after a while.Given string s and entity e, eldest son's string that fields match is defined as that among the s and field e be complementary.Whole fields match collection M of field f among the string s f(s) be defined as:
M f(s)=(i, j) | s [i..j] } be the eldest son's string that is complementary with field f of s.
2.3.2 analysis diagram
Analysis diagram G is used to represent the whole parsings as the given string s of the example of entity e.Given string s and entity e can make up analysis diagram as follows.
In a realization, in the following manner from fields match collection M f(s) make up analysis diagram G=(V, E).Exist and the corresponding node n of each index i among the string s i∈ V (G) (0<i<Len (s)).This brings among the figure | the individual node of V|=Len (s) (length of s).Limit collection among the figure is divided into two kinds, i.e. field limit E f(G) and delimiter limit E d(G), make E (G)=E f(G) UE d(G).
Directed edge (u, v, l) ∈ E (G) corresponding to node u to coming the limit of mark with mark l between the node v.If (i, j) ∈ M f(s), then there is field limit (n i, n j, f) ∈ E f(G).If (j>=i), then also have delimiter limit (n i, n j, d) ∈ E d(G).Notice that given substring can be complementary with a plurality of fields, and therefore can have a plurality of fields limit between two given nodes.The delimiter limit also causes the corresponding self-loopa with delimiter value ∈.
Resolving P is defined as among the G from node n 1To n Len (s)Directed walk, make limit on this path on the field limit E f(G) and delimiter limit E d(G) replace between.In form, if meet the following conditions, then resolve P and be defined as directed walk (e 1, e 2..., e k):
1) e 1U=n 1The start node e in path wherein 1U is the initial of string;
2) e kV=n Len (s)The end node e in path wherein kV is the ending of string;
3) e i∈ E f(G) and if only if e I-1∈ E d(G),
Figure BDA0000133670330000221
If e wherein iBe the field limit, last limit e on the path then I-1It is the delimiter limit; And
4) e i∈ E d(G) and if only if e I-1∈ E f(G),
Figure BDA0000133670330000222
If e wherein iBe the delimiter limit, last limit e on the path then I-1It is the field limit.
Analytical set SP comprises all this parsing P.Function getParseGraph (getting analysis diagram) will go here and there s and entity e as independent variable, and return corresponding analysis diagram G.
2.3.3 coupling is resolved
Function MatchParse (coupling is resolved) resolves descriptor p with two 1And p 2As independent variable (wherein resolve one of descriptor from resolving the descriptor storehouse, and another is associated with semantic entity, this semantic entity with input or output item and be associated), and the weight w of the similarity between these two format descriptor of represents.The Perfect Matchings of field and delimiter is endowed the highest weight.In a realization, if two format descriptor coupling is ideally then calculated the Hamming distance between field is mated with delimiter between two format descriptor.With the be inversely proportional to weight of ground represents degree of approach tolerance of Hamming distance.
2.3.4GetAllParse method
Function getAllParses will go here and there s with resolve descriptor set PD as independent variable, and return the correspondence of the difference of string s being resolved and descriptor through the entity instance collection and the expression of weighting.This function at first uses the getParseGraph function to make up analysis diagram G according to s and e.Every resolution path p in program distribute with the part of its soft-constraint that satisfies and with PD in resolve the proportional weight w of similarity of descriptor.This function return subsequently one group through the entity of weighting with resolve the descriptor tuple (e, p, w).
2.4 illustrative conversion module
This joint is described a realization of the conversion module 210 of Fig. 2.As previously mentioned, conversion module 210 signs can be used for the importation is transformed into one or more conversion modules of corresponding output.As described in father's patented claim, and with reference to figure 5 and 6, with different outputs being shown how corresponding to each different importations.For each this situation, conversion module 210 investigation (in the data storage 220) candidate transformation device module collection can be carried out the one or more conversion modules that the importation that is identified are transformed into corresponding output to determine whether to exist.In a situation, conversion module 210 signs can be carried out the single conversion module of required conversion.In other situations, when being identified at and using with specified order, carries out conversion module 210 two or more conversion modules of required conversion.In other situations, conversion module 210 possibly can't find any conversion module of carrying out required conversion.
Scene to be labeled as " A " begins, and conversion module 210 signs can be based on two conversion modules (502,504) that in output item, generate " feb " the digital month " 2 " in the input item.That is, conversion module 502 receives numeral month as (for example, Feb) the conduct output of the month mark of importing and generate three letters.The sentence that conversion module 504 receives initial caps as input and the sentence that generates small letter as output.
In scenario B, conversion module 210 signs can generate the single conversion module 506 of " 2015 " based on the numeral in the input item " 15 " in output item.That is, conversion module 506 receives year umber of two digits as input, and year umber of generation 4-digit number is as output.
In scene C, conversion module 210 signs can generate the single conversion module 508 of " CO " based on " Denver " in the input item in output item.That is, conversion module 508 receives city names and is referred to as input, and generates corresponding state name as output, and its Central Region name is corresponding to state that this city was positioned at.Conversion module 508 can use predetermined look-up table to carry out such conversion.
In scene D, conversion module 210 signs can generate the single conversion module 602 of numeral " 152 " based on the numeral in the input item " 152.02 " in output item.That is, conversion module 902 receives the floating-point dollar amount of money as input, and generates the dollar amount of money after rounding off as output.Though do not use in this example, in other scenes, conversion module 210 can rely on conversion module 604 and convert monetary information into another kind from a kind of monetary base, for example, convert dollar etc. into from sterling.
In scene E, the resolving information that conversion module 210 is provided based on parsing module 208 confirms that the same word in word " Paid " and the input item in the output item matees definitely.In this situation, conversion module 210 can be abandoned attempting seeking a conversion module and generate output " Paid ".On the contrary, program generating system (PGS) 102 is with generator program module 606, and this program module 606 is extracted word last in the input item simply and in output item, it come repetition as last word.
In a realization, the candidate conversion device module that conversion module 210 successive applications are different produces required result's conversion module with sign.After each calculating of using the particular converter module, conversion module 210 can confirm whether be complementary with the mode of unanimity and the output in the output item by the output information that this conversion module generates.For example, for scene C, conversion module 210 can provide city name set to close to different candidate conversion device modules in succession.Conversion module 210 can infer the city-to-state conversion module 508 generate with output item in the string that as one man matees of the state name that identified.Therefore, conversion module 210 can infer that conversion module 508 is the suitable selections that are used for generating in output item the state name.As stated, in some cases, conversion need use two or more conversion modules to convert the importation into corresponding output.Therefore, conversion module 210 also can be investigated the various combination of conversion module without any confusion.
Fig. 7 shows the process 700 of summing up above notion with the process flow diagram form.At frame 702, can conversion module 210 be confirmed and assigned to draw the output under considering through directly duplicate corresponding input part from the input item of correspondence.If, the then investigation carried out in can forgo block 704 of conversion module 210.At frame 704, can conversion module 210 can confirm use one or more conversion modules to draw the output under considering from the importation of correspondence.
Circulation among Fig. 7 has been indicated conversion module 210 can repeat top analysis and has been identified the full set of the conversion module of the output that can be used for generating output item up to it.In some cases, conversion module 210 can infer that the use conversion module can't draw one or more outputs.In fact, in some cases, conversion module 210 can infer that the use conversion module can't draw any output.
In the above example, suppose (being associated) with the input-output item all input items adopted consistent form.In this situation, if conversion module (or combination of conversion module) provides the successful transformation to whole instances of input-output example, then this conversion module is a feasible selection.In other were realized, program generating system (PGS) 102 can be handled the input item of two kinds of employings or more kinds of forms.For example, the input item of supposing Fig. 1 comprises the dual mode of describing invoice information.In this situation, conversion module 210 can identify to effective first conversion module of first subclass of input-output example (or combination of conversion module) and to effective second conversion module of second subclass of input-output example (or combination of conversion module).It still is second conversion module that the program 114 that is generated can use the condition clue that in input item, occurs to confirm to call first conversion module.
In a realization, program generating system (PGS) 102 uses extensible framework to realize candidate conversion device module collection.If new conversion module meets extensible framework (in the situation of the semantic entity manipulation technology embodiment that describes herein; This can be aforesaid probability programming scheme) form of being set forth, then the developer can add in the set by the conversion module that these are new.
In the context of the semantic entity manipulation technology embodiment that describes herein, conversion module 210 is designated example input-output clauses and subclauses and will be transformed into the semantic entity that input is associated and the conversion module of exporting the corresponding semantic entity that is associated.Yet, when finding search that generates required result, the search of suitable conversion module is not stopped.On the contrary, the result's that each conversion module produced who is associated by the type with the semantic entity that just is being considered continuous analysis continues, up to having considered whole this conversion modules.When in a plurality of conversion modules (or various combination of a plurality of conversion modules) each produces the required conversion to semantic entity, when producing output, consider all these conversion modules, like what will describe after a while.
In addition; In the context of the semantic entity manipulation technology embodiment that describes herein; The abundant support of any computing module of realization that the probability programming scheme of describing before conversion module adopts provides to the deviser, and do not need the terminal user to worry the property found or use sentence structure.Recalling calculation expression C is parseEntity E(s i, p) expression formula or conversion expression formula T (C 1..., C n).At this, T is any conversion of the regular representation of semantic entity being exported certain semantic entity as input and with regular representation.The no negative effect method that is associated with bottom class of being packed by entity or data type is the good selection of useful conversion.
In following chapters and sections, the big generic task that some useful conversion support that terminal user in fact will expect has been described.As understandable, these examples as an example and unrestricted appearing.The conversion module set also can comprise the module of other type.
2.4.1 arithmetic conversion
In a plurality of contexts, it can be useful conversion to produce required output that the input entity is carried out arithmetical operation.For example, in a realization, can make up " DateTime " entity that identifies the Date-Time string in the input and also identify the duration string.Given these data can be carried out a plurality of arithmetic conversion to produce Date-Time output.For example, " addDuration (adding the duration) " and " subtractDuration (deducting the duration) " conversion module can be generated to add respectively or from the Date-Time that is identified, to deduct the duration that is identified so that produce another Date-Time." DateTime " entity can also support to calculate two between the Date-Time string business day quantity " getBusinessDays (obtaining business day) " conversion module or calculate " getDuration (obtaining the duration) " conversion module of two duration between the Date-Time.
As another example, in a realization, can make up " Unit " entity that sign is represented the string in the input of length tolerance.Can calculate " addLength (adding length) " or " substractLength (deducting length) " conversion module subsequently measures measuring so that calculate output length to add or to deduct the length that is identified respectively.
The conversion 2.4.2 round off
It also can be very useful conversion to produce required output that the input entity is carried out the calculating of rounding off.For example, can use conversion module that expression is rounded off to the input entity of employee's data such as working time from bank balance, stock price.This round off maybe amount of currency be made progress (upper) be rounded to 1/4th dollars, with the time downwards (lower) be rounded to time slot half an hour, be rounded to numeral near the integer of (nearest) etc.Whole entities that be associated with digital metric and that have whole order that defined between its field can support to reach the conversion of rounding off of required precision.Some the common entity that falls into this kind has: Date-Time, duration, unit, numeral and currency.
In a realization, realize this conversion of rounding off as follows.With entity z as zero reference, roundOff (e, z, k, f, Mode) conversion (use can be downwards, upwards or near the pattern " Mode " of input) is rounded to the value of entity e the multiple of the k value of field f.Defining this based on following attribute rounds off:
1) the digital metric NM (e) that is associated with entity instance e.Be each substantial definition elementary field f Base∈ Fields (e).Field value is converted into the conversion factor of primary word segment value for whole other fields f ∈ Fields (e) definition
Figure BDA0000133670330000271
Digital metric is defined as NM (e)=∑ subsequently F ∈ Fields (e)(c f* v f), v wherein fThe value of expression field f.
2) neutral element z.
3) by PF=k*c fThe dilution of precision (PF) of definition.
Can adopt " roundOff (rounding off) " conversion module shown in Fig. 8 to calculate entity instance e is carried out (e, z, k, f, Mode) conversion of rounding off.It at first calculates digital value tolerance NM (e) and dilution of precision PF value.It calculates merchant q and the remainder r through NM (e)-z is obtained divided by PF subsequently.According to the quotient and the remainder value, be positioned at calculating lower limit L on the borderline entity value of required precision eWith upper limit U eMethod getEntity:Integer → e returns and the corresponding entity of digital metric independent variable.According to required rounding mode " Mode ", return suitable entity e '.
2.4.3 network transformation
Network transformation is to go up available semantic dictionary or serve those entity conversion of supporting through the network that uses a computer (such as the Internet or private intranet).Semantic entity manipulation technology embodiment described herein provides the seamless access to this service for the terminal user, and need not study and write the web script.The certain exemplary network transformation that is associated with two exemplary physical (that is, currency entity and address entity) is described below.
2.4.3.1 currency entity
In a realization, to fixed the date, can design and adopt conversion module to obtain to the general currency exchange rate of fixing the date with giving for two kinds of currency.For example, in table 900 shown in Figure 9, the terminal user wants to use the currency exchange rate on date shown in row-3 906 to convert the currency 902 in the row-1 into shown in row-2904 currency type, thereby obtains the result shown in the output row 908.For this purpose, the terminal user manually adopts reception sources currency type, purpose currency type and trade date to provide the currency conversion widget of currency exchange rate.Particularly, the terminal user provides string " USD (dollar) ", " EUR (Euro) " and " 24/05/2010 " to the currency conversion widget, and obtains for example result " 0.800192046091 ".The terminal user uses this result to fill the clauses and subclauses of first row in the output row.Realize using the currency converter module, automatically will be filled in the remaining element lattice of output row for one of semantic entity manipulation technology embodiment described herein with the value shown in the black matrix.
2.4.3.2 address entity
The conversion that some exemplary address entity is relevant comprises following.For address object, obtain its current local zone time, weather, latitude, longitude and nearest facility (for example, airport, bus stop, cafe etc.).Right for two address objects obtains driving time/distance, the driving time in traffic peak period, the walking time/distance between them.For two cities to (be not address itself, but position) and fixed the date information that obtains to sail through between them and the most cheap quotation of this direct route.Can obtain such information like a cork via the Internet.
Therefore, in a realization, a given address (or position of certain type), the information that conversion module can be designed to obtain required warp calculating is as output.For example; In table 1000 shown in Figure 10; The attribute of travelling frequently that depends on some ad-hoc location; Such as drive the time (each output column heading 1004,1006,1008 of table 1000 appears) to the peak period of user's office, to the walking distance in nearest gymnasium with to the driving distance of nearest university, the terminal user wants to make the selection of wisdom from (its address is arranged in first row 1002 of table 1000) 8 apartments.The terminal user is that first row obtains this information.Realize using the conversion module that is fit to for one of semantic entity manipulation technology embodiment described herein, automatically fill the remaining element lattice of output row with the information needed (not shown).
2.4.3.3 other entities
Other instances interested that comprise the conversion that obtains the information relevant with other semantic entities comprise (for example obtains the personal data stored each website; The cost of financial mix, credit card) or be stored in those conversion of the organized data (for example, calendar details, managing hierarchically structure) in the Active Directory.
2.5 illustrative formatting module
This joint has been described a realization of the formatting module 212 of Fig. 2.As previously mentioned, formatting module 212 generates the format instruction, and this instruction is presented to the terminal user with output with the form of original output item appointment, and this original output item is the part of the example that provides of user.
Aforesaid probability programming scheme provides abundant support for entity is printed as string.This top layer by the pcp expression formula makes up printEntity ELaunch, this pcp expression formula is with print format descriptor q={ (f 1, d 1, pp 1) ..., (f n, d n, pp n) as input, this printing descriptor be comprise that field is represented, the tuple sequence of delimiter and letter quality printer.The semanteme of printEntity constructed fuction comprises the entity instance of using function printEntityFD to print the correspondence of using q, and this entity instance is represented entity instance to print with the sequence of delimiter string as field according to q.Printing to field is represented uses the corresponding letter quality printer that is associated in the tuple to take place.Printing to basic data-type field is handled by the letter quality printer scheme of describing in 2.1 joints.To the printing of entity-type field through entity instance and with the corresponding printing descriptor of this field on recursively call printEntityFD and accomplish.
For example, Figure 11 illustrates Microsoft's
Figure BDA0000133670330000291
spreadsheet application or the needed format string of C# programming language so that print double-precision value with different-format.Figure 12 illustrates the format string of the Date-Time object of
Figure BDA0000133670330000292
spreadsheet application that is used for Microsoft or C# programming language.It is enough effable so that represent each in these format intentions that the print format descriptor is expressed formula q.In addition, formatting module can be inferred each in these format intentions from the input-output example.Therefore, the user need not to remember format descriptor.
2.6 study probability program
Given input-output example can acquistion semantic entity steering program P, and when in other non-exemplary inputs, moving, this semantic entity steering program P calculates required output.Be without loss of generality, suppose that input is the tuple of string, and output be single string.Probability program P comprises two components: through the parsing descriptor set PD of weighting with through the pcp of weighting expression formula collection.
At first, how from given input and output example, to learn parsing descriptor set PD with describing through weighting.For each string s, in whole input and output examples, method of application getAllParses calculates the set { (e of whole parsings of s 1, p 1, w 1) ..., (e n, p n, w n).Make PD ' be and corresponding the set { (p of each string s 1, w 1) ..., (p n, w n) many set union.If p 1=p 2, (p then 1, w 1+ w 2) can replace any two the element (p among the PD ' iteratively 1, w 1) and (p 2, w 2).PD is defined as result set PD ' subsequently.
For example, consider to have the date of three kinds of different-formats electrical form (that is U.S.'s form: moon/day/year, European form: day. the moon. year and Chinese form: the year-moon-Ri), shown in figure 13.Notice that in these dates some lacks the time, it is defaulted as this year by supposition.This required calculating of user preferably is described to condition and calculates, and wherein duplicates the time the output from input (if existence), if perhaps input does not comprise the time then give tacit consent to be this year.This is describedly handled through the semantic entity steering program being expressed as (through weighting) pcp expression formula collection by previous.The probability interpretation of this program (to input) is given more weight to its corresponding analytical expression and those pcp expression formulas that input is closer mated.
The terminal user also wants to come the date is formatd with (shown in the italic) preceding two indicated consolidation forms of output example.Output format has some nuances.For example, use the prefix of three letters and initial caps (that is, first letter is capitalization) to print the month string.Use the suffix " st ", " nd ", " rd " or " th " that are fit to print fate.Notice that semantic entity manipulation technology embodiment described herein only supports this abundant format with being associated such as base types such as string and integers through allowing the developer with the letter quality printer routine.
The input string data have multiple explanation and noise.This is to be handled by as described earlier the explanation with the analytical set (weight is high more, resolves just more possible) that generate through weighting through probability of use.Particularly, for input 6/3/2008, compare the day/moon/year, prefer to and resolve the moon/day/year, because calculate Jun 3rd, the calculating that produced 2008 (on June 3rd, 2008) is the simplest one (that is identical mapping).For input 2.5.2008, compare the moon. day. year, prefer to and resolve day. month. year, because resolve descriptor day. month. in electrical form, have more significantly in year and occur (particularly, 25.3.2007 has and resolves day. month. year).The input ' 09-Fabruary-1 might the quilt resolved to the year-moon-Ri; Because string Fabruary the most closely is matched with month name February (February); And " ' " character is positioned at before the integer 09, and this possibly be the delimiter that before two digits time string, occurs.
The semantic entity steering program that can acquistion will be in the past generates required output in two input-output examples and all the other non-exemplary inputs shown in Figure 13.More specifically, resolve descriptor p 1=Day 1Month 1Year 1The weight of (day month year) is to resolve descriptor p 2=Month 1Day 1Year 14 times of the weight of (month day year).This is because there is p 1Be 4 inputs (that is, 2.5.2008,25.3.2007,26.3.2007,27.3.2007) of effectively resolving descriptor, and only have p 2It is 1 input (being 2.5.2008) of effectively resolving descriptor.Therefore, compare and p 2Corresponding parsing, the input date 2.5.2008 have with corresponding to p 1The higher weight that is associated of parsing.This shows and allows probability resolution so that the most possible parsing of the input string of ambiguous estimation correctly to the use through the parsing descriptor of weighting.
The hypothesis electrical form has noisy input 5.16.2010 now.The weight that is associated with the parsing descriptor will help p by 4 to 2 1, and therefore the highest parsing through weighting of 2.5.2008 will be still and p 1Corresponding that.This shows the use through the parsing descriptor of weighting existed in data under the situation of noise and has kept probability resolution sane.
Next, how to learn through the pcp of weighting expression formula collection PCP describing.Consider to comprise the single row of one of input-output example.For the input string s in the i row of example row i, use the getAllParse method to obtain its parsing descriptor (e, p, w) collection through weighting
Figure BDA0000133670330000311
Each parsing is returned entity instance e with its canonical form.In a similar fashion; The parsing descriptor set that the parsing descriptor set
Figure BDA0000133670330000312
that calculates full line obtains output string r subsequently is for each the input entity instance vector
Figure BDA0000133670330000315
that makes
Figure BDA0000133670330000314
and each output entity instance e ' of making
Figure BDA0000133670330000316
, one group of calculating
Figure BDA0000133670330000318
that study is mapped to
Figure BDA0000133670330000317
e ' through weighting Zhen Dui available mapping ensemble calculating based on the exhaustive search of type so that calculating
Figure BDA0000133670330000319
(to employed conversion limited amount system in the calculating in a realization).The Weight (C) (weight) of any calculating
Figure BDA00001336703300003110
is defined as
Figure BDA00001336703300003111
Figure BDA00001336703300003112
(simplicity); Wherein in a realization; Composite function
Figure BDA00001336703300003113
is defined as min (minimum value) function, and the simplicity tolerance of C is defined as the inverse of the size of C.In order to print the entity e ' that calculates with required output format j, also need learn to print parsing descriptor q ' jBe used to print p through the search of limit ground jIn field represent the letter quality printer of f, from comprising (f, d) right parsing descriptor p ' jStudy is printed and is resolved descriptor q ' j
Make PCP ' be set { ( PrintEntity ( C , q ′ j ) , Weight ( C ) ) | C ∈ C ~ } Many set union.If O 1=O 2, (O then 1, w 1+ w 2) can replace any two the element (O among the PCP ' iteratively 1, w 1) and (O 2, w 2).PCP is defined as result set PCP ' subsequently.
For example, suppose that the terminal user wants the stock price with 6 radix point positions in the row 1 of electrical form is formatted into 2 radix point positions.The terminal user provides one and comprises that " 12.124532 " are as input and " 12.12 " the example row as corresponding output.Two possible calculating that can realize this conversion are: print maximum 2 decimal places and print 2 decimal places definitely.Subsequently, the terminal user provides another example row that " 12.1 " are exported as input and " 12.10 " conduct.Once more, two possible calculating are: print at least 2 decimal places and print 2 decimal places definitely.The probability program P that is learnt will comprise through the pcp of weighting expression formula collection; Compare weight with " maximum 2 decimal places " and the corresponding pcp expression formula of " at least 2 decimal places " conversion; This pcp expression formula set pair is in having the weight of twice with " 2 definite decimal places " corresponding pcp expression formula of conversion, because " maximum 2 decimal places " and " at least 2 decimal places " conversion only obtain the result in one of each example.Therefore, when operation P in new input, the highest output of sorting will be corresponding to conversion " 2 definite decimal places ".This example shows the semantic entity steering program of being learnt that is used to generate through the pcp of weighting expression formula collection to be had the pcp expression formula of striding the common calculating of a plurality of input-output examples to expression and gives the desirable attribute of more weights.
Consider another example.This example shows the semantic entity steering program of being learnt and how condition is carried out modeling, such as the time characteristic that lacks described in conjunction with Figure 13.From two input-output examples, learn to have following two pcp expression formulas of higher weight:
Figure BDA0000133670330000322
Wherein
Figure BDA0000133670330000324
Figure BDA0000133670330000325
p 1=Month 1/ Day 1/ Year 1, q 1={ (Month 2, " ", string-PrefixP (3)), (Day 1, ", ", IntOrd), (Year 1, ∈; Identity) }, C ~ 2 = SetYear ( ParseEnti Ty e ( s 2 , p 2 ) , 2010 ) , p 2=Month 1/ Day 1, and q 2With q 1Identical.
Given new input " 4.24 " now, it resolves descriptor p=Month 1.Day 1(month day) compared p 1With p 2More mate, and therefore will in this input, carry out the calculating that has much higher weight than identical calculating
Figure BDA0000133670330000331
The program of being learnt is just implicitly distinguished it and is resolved the input that does not have the time field in the descriptor.
2.7 illustrative semantic entity manipulation process
In view of aforementioned, use description to realize the example process of semantic entity manipulation technology embodiment described herein now.With reference to Figure 14, example process begins (frame 1400) to receive the input-output example.As previously mentioned, each input-output example provides one or more input items and corresponding required output item.The input and output item that is received is resolved to produce the analytical set (frame 1402) through weighting.These each expressions in the parsing of weighting are to the different potential parsing of each input and output item, and weight is according to being that the tolerance of the possibility of effectively resolving has been carried out weighting to this parsing based on compare this parsing with the parsing storehouse of regulation.Next, select previous non-selected input-output example (frame 1404).The one or more conversion of sign from one type transformation library, the conversion of this type can generate required output item (frame 1406) from the input item of selected example.In addition, the sign format instruction, this instruction can to output item format in case with format to the required output item of selected input-output example be complementary (frame 1408).Determine whether to exist still non-selected any remaining input-output example (frame 1410) subsequently.If, the process action of repetitive operation frame 1404 to 1410 then.When selecting and handled whole input-output example; Generating probability program (frame 1412); When one or more input item of the input item same type of given and input-output example, this probability program adopts the conversion that is identified to generate and the corresponding output item of these one or more input items with the format instruction.Receive the one or more input items (frame 1414) with the input item same type of input-output example subsequently, and use the probability program that is generated to produce and the corresponding output item of each input item that is received (frame 1416).
Also might receive the one or more input items with the input item same type of input-output example with receiving the input-output example.In this situation, resolve input item and output item with produce aforementioned activities through the analytical set of weighting can comprise parsing that receive with the input item of the input item same type input-output example and the input and output item that is associated with the input-output example.
About resolving input item and output item to produce action through the analytical set of weighting; Notice in a realization; Each input item and output item are character strings, and resolve the storehouse and comprise a plurality of semantic entities, and each semantic entity has an entity class and set of fields.Given this point; For each character string of expression input and output item and each semantic entity that in resolving the storehouse, finds; The identification field set of matches, wherein each fields match representes to input or output a field with from a semantic entity of resolving the storehouse in the character string is complementary eldest son's string.As previously mentioned, make up analysis diagram, make that there is a node in each character in the character string from this fields match collection.In addition, for each fields match, the node of the termination character of this fields match of expression is pointed on the field limit from the node of the bebinning character of representing this fields match.In addition, each node of the character from the expression character string is set up the delimiter edge with the node of representing character late in the character string.Next; Come the identification (RNC-ID) analytic path through analysis diagram; Wherein every resolution path comprises a sequence node; This sequence node begins and finishes with the node of representing last character in the character string with the node of first character in the expression character string, and the node in this sequence comes alternately to connect through the limit that replaces between field limit and the delimiter limit.Thus, every resolution path that is identified comprises semantic entity field and the right sequence of delimiter.Next, calculate the constraint weight factor for every resolution path that is identified.More specifically; Each semantic entity is associated with constraint set through weighting; This constraint set is the order between the distribution of semantic entity specific field and field or the delimiter, and wherein for the constraint of more likely being showed by semantic entity, the constraint weight of this constraint is bigger.Given this point, make the constraint weight factor be associated with the quantity of the constraint of being considered semantic entity, that resolution path satisfied and distribute to the constraint weight of each constraint of satisfying proportional.For example, in a realization, this is that weight factor is defined as the weight sum of those constraints of being satisfied and the ratio of whole weight sums that retrain is accomplished through retraining.Next, calculate to resolution path and be associated with semantic entity each effectively resolve tolerance of the similarity between descriptor set.In a realization; This is to measure through the degree of approach that the Hamming distance between calculating and resolution path and the parsing descriptor is inversely proportional to realize; Make that the pattern of semantic entity field and delimiter in the resolution path is approaching more with the pattern of semantic entity field and delimiter of resolving in the descriptor, the degree of approach is measured big more.Degree of approach tolerance is designated as effective parsing descriptor measuring similarity of resolution path subsequently.These effective each of resolving in the descriptor comprise semantic entity field and the right sequence of delimiter, and this sequence is represented the effective parsing to semantic entity.Calculate the weighted value of resolution path subsequently based on the constraint weight factor with for effective parsing descriptor measuring similarity of resolution path calculating.In a realization, weighted value is directly proportional with the measuring similarity of constraint weight factor and Geng Gao.For example, it can be selected to the product of two amounts.Resolution path is designated as the parsing descriptor of semantic entity subsequently, and specified parsing descriptor is associated with semantic entity with the weighted value that is calculated so that form the semantic entity and parsing descriptor tuple through weighting.Notice that in a realization each is just converting the canonical form of the regulation of this semantic entity to through the semantic entity of weighting and the semantic entity of resolving the descriptor tuple, as previously mentioned.The canonical form of the regulation of sign semantic entity in the entity class of entity.
About the action that can from the input item of input-output example produce the one or more conversion of required output item of sign from one type transformation library; Notice that in a realization, the entity class of each semantic entity has identified the one or more conversion that are applicable to this entity class.Given this point, as previously mentioned, parsing descriptor tuple that sign is associated with the input item of input-output example and the parsing descriptor tuple that is associated with the output item of input-output example.Subsequently; For with the parsing descriptor tuple of the input and output item of input-output example in one or more conversion of being associated of entity class in each; Identify like down conversion: when being applied to the semantic entity of the institute's identification (RNC-ID) analytic descriptor tuple that is associated with input item, the entity of institute's identification (RNC-ID) analytic descriptor tuple that this conversion generation is associated with output item.In addition, be applied transformation calculations weighting factor, this weighting factor representes that having many to applied conversion possibly be the tolerance of required conversion.Generally speaking, be based upon the weighted value that the parsing descriptor tuple of input and output item in the input-output example calculated and the tolerance of conversion complexity and calculate this weighting factor.More specifically; In a realization; Calculate weighting factor through the minimal weight value among both below the sign at first: 1) with the weighted value of the parsing descriptor tuple that its input item of using the input-output example of conversion is associated, and 2) weighted value of the parsing descriptor tuple that is associated with output item to the input-output example of its application conversion.Next, calculate the inverse of the size of applied conversion.Calculate the product of the inverse that is calculated of the minimal weight that is identified and the transform size of using, and be assigned therein as weighting factor.
Notice that in some cases required conversion possibly need to use one with up conversion.Given this, in the replacement realization of sign, adopt following variation to aforementioned process from the action of the conversion that can from the input item of input-output example, generate required output item of one type transformation library.More specifically; For with the parsing descriptor tuple of the input and output item of input-output example in one or more conversion of being associated of entity class in combination; Reach the specified quantity of conversion in this combination; Sign makes up like down conversion: when being applied to the semantic entity of the institute's identification (RNC-ID) analytic descriptor tuple that is associated with input item, and the entity of institute's identification (RNC-ID) analytic descriptor tuple that this conversion combination results is associated with output item.In addition, the weighting factor of applied conversion combination is defined as to have many to it possibly be the tolerance of required conversion combination.
Can format so that the action of the format instruction that is complementary with format output item about sign the required output item of input-output example; Generally speaking; This possibly need and will be associated with each the semantic entity field that in the parsing descriptor of resolving the descriptor tuple, is identified from the format instruction of format instruction database; This parsing descriptor tuple is associated with the output item of input-output example, and the form that this format instruction is showed with the output item part that is associated with this semantic entity field generates the corresponding field of the semantic entity of this descriptor tuple.More specifically; In a realization, this needs at first to confirm that whether the semantic entity field is one of one group of regulation base type field of each semantic entity field of identifying in the parsing descriptor of the parsing descriptor tuple that is associated with the output item of input-output example.As long as semantic field is confirmed as the base type field, just will formats instruction and be predefined should the semanteme field being associated of base type field of the type.Yet; As long as being confirmed as, the semantic field in the parsing descriptor of the parsing descriptor tuple that is associated with the output item of input-output example not the base type field (promptly; It is an entity type field), then, said process comes the sign format instruction through recursively being applied to this entities field.
Action about the generating probability program; One or more input items of the input item same type of wherein given and input-output example; This probability program adopts the conversion that is identified to produce and the corresponding output item of these one or more input items operation below this needs in a realization with the format instruction.For each semantic entity in the parsing descriptor tuple of the output item of input-output example, conversion, its format that is identified instruction through the semantic entity of weight factor of calculating and the institute's identification (RNC-ID) analytic descriptor tuple that is associated with output item are made up and formed parsing-calculating-printing tuple.For with the parsing descriptor tuple of the input item of input-output example in one or more conversion of being associated of entity class in each conversion accomplish this point.Sign has the parsing-calculating-printing tuple of identical conversion and format instruction subsequently.For every group of parsings-calculatings-printings tuple, instruct and replace this group as the single parsing-calculating-printing tuple of the weight factor of whole weight factor sums in this group with having the conversion identical and formaing with this group with identical conversion and format instruction.In addition, the input item of sign input-output example and the parsing descriptor tuple with identical parsing descriptor of output item.Resolve the descriptor tuple for having every group of identical parsing descriptor, with having the parsing descriptor identical and replacing this group as the single parsing descriptor tuple of the weighted value of whole weighted value sums in this group with this group.At last remaining parsing descriptor tuple and parsing-calculating-printing tuple are made up to form the probability program.
The action of the corresponding output item of input item that the probability program that is generated about use produces and received; Notice at each to be converted in the realization of canonical form of regulation, use conversion and will produce semantic entity with canonical form equally through the semantic entity of weighting and the semantic entity of resolving the descriptor tuple.Given this, sign can format so that the aforementioned activities of the format instruction that is complementary with the format of the required output item of input-output example comprises that also (if necessary) converts the canonical form of this output item from regulation and the consistent form of the required output item of input-output example to output item.
2.8 illustrative semantic entity control system
In view of aforementioned, use description to realize the example system of semantic entity manipulation technology embodiment described herein now.With reference to Figure 15, exemplary semantic entity control system 1500 comprises parsing module 1502, conversion module 1504 and formatting module 1506.1502 pairs of parsing modules constitute the input item and the output item 1508 of input-output example and resolve to produce the analytical set through weighting.These each expressions in the parsing of weighting are to the different potential parsing of each input and output item, wherein according to based on to compare this parsing be that the tolerance of the possibility of effectively resolving has been carried out weighting to this parsing with resolving the storehouse.Conversion module 1504 can be each input-output example produces required output item from input item one or more conversion from one type transformation library 1512 signs.Formatting module 1506 sign format instructions, this instruction can be formatd so that be complementary with the format that the required output item of input-output example is carried out output item.
The program generating module 1514 that also has generating probability program 1516; One or more input items of the input item same type of given and input-output example, probability program 1516 adopt to instruct with the format of from form changing module 1506 from the conversion that is identified of conversion module 1504 and produce one group of output item through sorting.These output items of inferior ordered pair according to being the weighting factor of its calculating sort, and weighting factor has represented that the indication conversion generates the tolerance of the accuracy of output item from the input item that is associated with the input-output example.Program execution module 1518 is used to use the probability program 1516 that is generated to produce output item, this output item corresponding to the input item that is received of the input item same type of input-output example.Realize producing and one group of the highest corresponding output item of output item of ordering in the output item of ordering for one of this program execution module 1518.When in the output item of ordering, having an above output item for one group, program execution module 1518 produces a designator, and the lower output item of these other orderings of designator indication is produced and can supply consults.
2.9 illustrative user interface operation
This joint combines spreadsheet program to describe a realization of employed graphic user interface.The user at first selects to comprise a rectangular area of the electrical form of input and output row.The non-example input that this selecteed zone comprises one or more input-output examples and needs its output.The row that are filled are at most treated as the input row, and the less row that are filled are listed as as output.
Yet in a realization, the user can also select a plurality of row scopes and identify which row clearly is that input is output with which row.Most of cell in input row has in the situation of sky clauses and subclauses that this is especially favourable, because it possibly be considered to the output row.The row that will comprise the clauses and subclauses that are used to export row is treated as the input-output example of the program that is used for learning for these output row.
In a realization, the user selects " application " button (or similarly), and electrical form is filled by following.Be each output column-generation semantic entity steering program as previously mentioned.For each output unit lattice α R, cBe expert among r operation on the specified input state of (row r and output row c), system is used to export the program of being learnt of row c, to generate the possible output collection O (those outputs that perhaps possibly just have sufficiently high weight) through sorting.The following filler cells lattice α of system R, c:
1) if O comprises a string, then the filler cells lattice are gone here and there with this by system;
2) if O comprises a plurality of strings; Then system comes the filler cells lattice with the highest string of separating of representing to sort; But outstanding this cell that shows exists a plurality of calculating to explain to point out some examples that the user is provided to the user, and the user possibly want to investigate the accuracy by the output of the outstanding cell that shows; And
3) if O is empty, then system with " " Come the filler cells lattice to arouse attention: the user should be this cell output is provided.
The user can come it is made amendment subsequently through right click on the content of any cell, wherein opens the dialog box that allows the user from other strings of corresponding sequence O, to select or provide together new output.After any this modification, with coming automatically to repeat top learning process through the input-output example set of expansion, and the content of automatically upgrading electrical form is to reflect the new result who is learnt.
3.0 exemplary operation environment
Semantic entity manipulation technology embodiment described herein can operate in multiple general or special-purpose computing system environment or configuration.Figure 16 illustrates the simplification example of the general-purpose computing system of each embodiment that can realize semantic entity manipulation technology embodiment described herein on it and element.It should be noted that; Show the replacement embodiment of simplifying computing equipment by the line or the represented any frame table of dotted line that break off among Figure 16, and in these replacement embodiments of following description any or all can combine to run through described herein other and replace embodiments and use.
For example, Figure 16 shows generalized system figure, and it illustrates simplifies computing equipment 10.Such computing equipment can find in the equipment with at least some minimum of computation abilities usually, and these equipment include but not limited to personal computer, server computer, hand-held computing equipment, on knee or mobile computer, such as communication facilitiess such as cell phone and PDA, multicomputer system, system, STB, programmable consumer electronics, network PC, small-size computer, mainframe computer, video media player etc. based on microprocessor.
For permission equipment is realized semantic entity manipulation technology embodiment described herein, this equipment should have enough computing powers and system storage is operated to launch basic calculating.Particularly, shown in figure 16, computing power is generally illustrated by one or more processing units 12, and can comprise one or more GPU14, any in the two or all communicate by letter with system storage 16.Notice that the processing unit 12 of universal computing device can be a special microprocessor, like DSP, VLIW or other microcontrollers, maybe can be conventional CPU with one or more process nuclear, comprise in the multi-core CPU based on the GPU specific core.
In addition, the simplification computing equipment of Figure 16 also can comprise other assemblies, such as for example communication interface 18.The simplification computing equipment of Figure 16 also can comprise one or more conventional computer entry devices 20 (for example, pointing device, keyboard, audio input device, video input apparatus, tactile input device, be used to equipment that receives wired or wireless data transmission etc.).The simplification computing equipment of Figure 16 also can comprise other optical modules; Such as one or more conventional display devices 24 for example and other computer output equipments 22 (for example, audio output apparatus, picture output device, be used to transmit the equipment of wired or wireless data transmission etc.).Notice that the typical communication interface 18 of multi-purpose computer, input equipment 20, output device 22 and memory device 26 are known to those skilled in the art, and can not describe in detail at this.
The simplification computing equipment of Figure 16 also can comprise various computer-readable mediums.Computer-readable medium can be can be by any usable medium of computing machine 10 via memory device 26 visits; And comprise it being removable 28 and/or not removable 30 volatibility and non-volatile media, this medium is used for storage such as information such as computer-readable or computer executable instructions, data structure, program module or other data.And unrestricted, computer-readable medium can comprise computer-readable storage medium and communication media as an example.Computer-readable storage medium includes but not limited to: computing machine or machine readable media or memory device, and such as DVD, CD, floppy disk, tape drive, hard disk drive, CD drive, solid-state memory device, RAM, ROM, EEPROM, flash memory or other memory technologies, tape cassete, tape, disk storage or other magnetic storage apparatus or can be used for storing information needed and can be by any other equipment of one or more computing equipments visits.
Also can realize such as the reservation of information such as computer-readable or computer executable instructions, data structure, program module, and comprise any wired or wireless information transmission mechanism through using encode one or more modulated message signal or carrier wave or other transmission mechanisms or communication protocol of in the various above-mentioned communication medias any.Notice that term " modulated message signal " or " carrier wave " refer generally to be provided with or change with the mode that the information in the signal is encoded the signal of its one or more characteristics.For example; Communication media comprises the wire medium that carries one or more modulated message signal such as cable network or direct line connection etc., and is used to transmit and/or receive the wireless medium of one or more modulated message signal or carrier wave such as acoustics, RF, infrared ray, laser and other wireless mediums etc.Arbitrary combination of above-mentioned communication media also should be included within the scope of communication media.
In addition, can be by the stored in form of computer executable instructions or other data structures, receive and send or from any required combination of computing machine or machine readable media or memory device and communication media, read part or all of software, program and/or computer program or its each several part of specializing semantic entity manipulation technology described herein.
Finally, semantic entity manipulation technology embodiment described herein also can describe in the general context of being carried out by computing equipment such as computer executable instructions such as program modules.Generally speaking, program module comprises the routine carrying out particular task or realize particular abstract, program, object, assembly, data structure etc.Each embodiment described herein task therein realizes by carrying out in the DCE of perhaps in the cloud of these one or more equipment, carrying out through one or more teleprocessing equipment of one or more linked.In DCE, program module can be arranged in this locality and the remote computer storage medium that comprises media storage device.In addition, above-mentioned instruction can be partly or integrally as comprising or not comprise that the hardware logic electric circuit of processor realizes.
4.0 other embodiment
Should be noted that and to use any of this instructions the foregoing description in full or all to form other mix embodiment by required any combination.In addition, although with the special-purpose language description of architectural feature and/or method action this theme, be appreciated that subject matter defined in the appended claims is not necessarily limited to above-mentioned concrete characteristic or action.On the contrary, above-mentioned concrete characteristic is disclosed as the exemplary forms that realizes claim with action.

Claims (10)

1. one kind is used to adopt the probability program to generate the computer implemented method of required output item from one or more input items, and said probability program is to use the input-output example to generate, and said method comprises:
Use a computer and carry out following method action:
Receive input-output example (1400), each input-output example provides one or more input items and corresponding required output item;
Resolve said input item and said output item to produce analytical set (1402) through weighting; Each is through the different potential parsing to each input and output item of the analytic representation of weighting, wherein according to based on to compare said parsing be that the tolerance of the possibility of effectively resolving comes weighting has been carried out in said parsing with resolving the storehouse;
For each input-output example,
The one or more conversion of sign from one type transformation library, said one or more conversion can produce required output item (1406) from said input item, and
The sign format instruction, said format instruction can to output item format in case with the format that the required output item of said input-output example is carried out be complementary (1408);
Generating probability program (1412); One or more input items of the said input item same type of wherein given and said input-output example, conversion that said probability program employing is identified and said format are instructed and are produced and the corresponding output item of said one or more input items; And
One or more input items (1414) of the said input item same type of reception and said input-output example, and use the probability program that is generated to produce and the corresponding output item of input item (1416) that is received.
2. the method for claim 1; It is characterized in that; Each input item and output item comprise character string; And wherein said parsing storehouse comprises a plurality of semantic entities, and each semantic entity comprises entity class and set of fields, and wherein resolves said input item and said output item and comprise following action with the method action that produces through the analytical set of weighting:
For each character string of the said input and output item of expression and each semantic entity that in said parsing storehouse, finds,
The identification field set of matches, each fields match eldest son's string of comprising that the said semantic entity field with from said parsing storehouse that inputs or outputs in the character string is complementary wherein,
Make up analysis diagram from said fields match collection; Make that there is a node in each character in the said character string; And for each fields match; The node of the termination character of the said fields match of expression is pointed to from the node of the bebinning character of representing said fields match in the field limit, and in each node of the character from represent said character string and the said character string of expression arbitrarily the node of character late set up the delimiter limit
Sign is through the resolution path of said analysis diagram; Wherein every resolution path comprises sequence node; Said sequence node begins with the node of representing first character in the said character string and finishes with the node of representing last character in the said character string; And the node in the said sequence comes alternately to connect through the limit that between field limit and delimiter limit, replaces, and makes every resolution path that is identified comprise semantic entity field and the right sequence of delimiter, and
For every the resolution path that is identified,
For the resolution path that each identified is calculated the constraint weight factor; The quantity of said constraint weight factor and said resolution path constraint that satisfied, that be associated with the semantic entity of being considered and to distribute to the constraint weight of each constraint of satisfying proportional; Wherein the set through the constraint of weighting of each semantic entity and field of specifying said semantic entity and the order between field or the delimiter is associated; Wherein for the constraint of more likely being showed by said semantic entity; The constraint weight of this constraint is bigger
Each effectively resolves the tolerance of the similarity between the descriptor in the effective parsing descriptor set that calculates said resolution path and be associated with said semantic entity; Wherein each effectively resolves semantic entity field and the right sequence of delimiter that descriptor comprises effective parsing of representing said semantic entity
Be based upon the said constraint weight factor of said resolution path calculating and effectively resolve the weighted value that the descriptor measuring similarity calculates said resolution path, and
Said resolution path is appointed as the appointment of said semantic entity and is resolved descriptor, and said appointment parsing descriptor is associated with said semantic entity with the weighted value that is calculated so that form the semantic entity and parsing descriptor tuple through weighting.
3. method as claimed in claim 2; It is characterized in that; The entity class sign of each semantic entity is applicable to one or more conversion of said entity class; And, can comprise following action from the method action that said input item produces one or more conversion of required output item from one type transformation library sign wherein for each input-output example:
The parsing descriptor tuple that sign is associated with the said input item of said input-output example;
The parsing descriptor tuple that sign is associated with the said output item of said input-output example; And
For with the parsing descriptor tuple of the input and output item of said input-output example in one or more conversion of being associated of entity class in each,
Sign is with down conversion: when being applied to the semantic entity of the parsing descriptor tuple that is identified that is associated with said input item, said conversion produces the entity of the parsing descriptor tuple that is identified that is associated with said output item, and
Be each this transformation calculations weighting factor, wherein said weighting factor is to be the tolerance of the possibility of required conversion to said conversion.
4. method as claimed in claim 3; It is characterized in that; Sign can format so that the method action of the format instruction that is complementary with the format that the required output item of said input-output example is carried out comprises following action output item: each the semantic entity field that is identified in the parsing descriptor for the parsing descriptor tuple that is associated with the said output item of said input-output example; Will be from the format instruction associated action of format instruction database, the form that said format instruction is showed with the output item part that is associated with said semantic entity field produces the corresponding field of the semantic entity of said descriptor tuple.
5. method as claimed in claim 4; It is characterized in that; One or more input items of the said input item same type of wherein given and said input-output example; Conversion that said probability program employing is identified and said format are instructed and are produced and the corresponding output item of said one or more input items, and the method action that generates said probability program comprises following action:
For each semantic entity in the parsing descriptor tuple of the said output item of said input-output example,
For with one or more conversion that the entity class of the parsing descriptor tuple of the said input item of said input-output example is associated in each conversion; The format that the is identified instruction of the weight factor that calculates of said conversion, said conversion and semantic entity is associated forming parsing-calculating-printing tuple, and said semantic entity is the semantic entity of the parsing descriptor tuple that is identified that is associated with said output item;
Sign has the parsing-calculating-printing tuple of identical conversion and format instruction;
For every group of parsings-calculatings-printings tuple, instruct and replace this group as the single parsing-calculating-printing tuple of the weight factor of whole weight factor sums in this group with having the conversion identical and formaing with this group with identical conversion and format instruction;
Identify the said input item of said input-output example and the parsing descriptor tuple with identical parsing descriptor of output item;
Resolve the descriptor tuple for having every group of identical parsing descriptor, with having the parsing descriptor identical and replacing this group as the single parsing descriptor tuple of the weighted value of whole weighted value sums in this group with this group; And
Remaining parsing descriptor tuple and parsing-calculating-printing tuple are divided into groups to form said probability program.
6. method as claimed in claim 3 is characterized in that, sign can format so that the method action of the format instruction that is complementary with the format that the required output item of said input-output example is carried out comprises following method action output item:
Each the semantic entity field that is identified in the parsing descriptor for the parsing descriptor tuple that is associated with the said output item of said input-output example confirms whether said semantic entity field is one of base type field of one group of regulation;
As long as the semantic field in the parsing descriptor of the parsing descriptor tuple that is associated with the said output item of said input-output example is confirmed as the base type field, just will formats instruction and be predefined should the semanteme field being associated of base type field of the type;
As long as being confirmed as, the semantic field in the parsing descriptor of the parsing descriptor tuple that is associated with the said output item of said input-output example not the base type field; Just will format instruction and be associated, the form that said format instruction is showed with the output item part that is associated with said semantic entity field produces the corresponding field of the semantic entity of said descriptor tuple.
7. method as claimed in claim 2; It is characterized in that; The entity class sign of each semantic entity is applicable to one or more conversion of said entity class; And, can comprise following action from the method action that said input item produces one or more conversion of required output item from one type transformation library sign wherein for each input-output example:
The parsing descriptor tuple that sign is associated with the said input item of said input-output example;
The parsing descriptor tuple that sign is associated with the said output item of said input-output example; And
For with the parsing descriptor tuple of the input and output item of said input-output example in the combination of one or more conversion of being associated of entity class, reach the specified quantity of conversion in the said combination,
Sign makes up with down conversion: when being applied to the semantic entity of the parsing descriptor tuple that is identified that is associated with said input item, and the entity of the parsing descriptor tuple that is identified that said conversion combination results is associated with said output item, and
Be the conversion combination calculation weighting factor that each identified, the tolerance that it is the possibility of required conversion combination that said weighting factor comprises said conversion combination.
8. the method for claim 1; It is characterized in that; Receive the one or more input items with the input item same type of said input-output example in conjunction with receiving said input-output example, and wherein resolve said input item and said output item and comprise the input and output item that is associated with said input-output example and the action of resolving with the input item that is received of the input item same type of said input-output example to produce method action through the analytical set of weighting.
9. one kind is used to adopt the probability program to produce the system of required output item from one or more input items, and said probability program is to use the input-output example to generate, and said system comprises:
Computing equipment; And
Computer program, said computer program comprise the program module of being carried out by said computing equipment, and said program module comprises:
Parsing module (1502); The input item of the said input-output example of parsing formation and output item (1508) are to produce the analytical set through weighting; Each is through the different potential parsing to each input and output item of the analytic representation of weighting; Wherein according to based on to compare said parsing be that the tolerance of the possibility of effectively resolving comes weighting has been carried out in said parsing with resolving storehouse (1510)
Conversion module (1504), for each input-output example, said conversion module identifies one or more conversion from one type transformation library (1512), and said one or more conversion can produce required output item from said input item, and
Formatting module (1506), the instruction of said formatting module sign format, said format instruction can be formatd so that be complementary with the format that the required output item of input-output example is carried out output item;
Program generating module (1514); Said program generating module generating probability program (1516); One or more input items of the input item same type of given and said input-output example; Conversion that said probability program employing is identified and said format are instructed and are produced one group of output item through ordering; Wherein sort according to the said output item of inferior ordered pair that is the weighting factor of its calculating, said weighting factor comprises that the said conversion of indication produces the tolerance of the accuracy of said output item from the input item that is associated with said input-output example, and
Program execution module (1518), said program execution module use the probability program that is generated to produce output item, this output item corresponding to the input item that is received of the input item same type of said input-output example.
10. system as claimed in claim 9; It is characterized in that; Said program execution module produces and one group of the highest corresponding output item of output item of ordering in the output item of ordering; And as long as there is an above output item in this group in the output item of ordering, said program execution module just further produces designator, and the lower output item of said other orderings of designator indication is produced and can supply consults.
CN201210023688.6A 2011-02-03 2012-02-02 Semantic entity control using input and output sample Active CN102682065B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/020,153 US8799234B2 (en) 2010-07-12 2011-02-03 Semantic entity manipulation using input-output examples
US13/020,153 2011-02-03

Publications (2)

Publication Number Publication Date
CN102682065A true CN102682065A (en) 2012-09-19
CN102682065B CN102682065B (en) 2015-03-25

Family

ID=46813998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210023688.6A Active CN102682065B (en) 2011-02-03 2012-02-02 Semantic entity control using input and output sample

Country Status (1)

Country Link
CN (1) CN102682065B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105283868A (en) * 2013-03-03 2016-01-27 微软技术许可有限责任公司 Probabilistic parsing
CN105981006A (en) * 2014-02-14 2016-09-28 三星电子株式会社 Electronic device and method for extracting and using sematic entity in text message of electronic device
CN109948164A (en) * 2019-04-02 2019-06-28 北京三快在线科技有限公司 Processing method, device, computer equipment and the storage medium of statistical demand information
CN113966518A (en) * 2019-02-14 2022-01-21 欧司朗有限责任公司 Controlled agricultural system and method of managing agricultural system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1400547A (en) * 2001-08-03 2003-03-05 富士通株式会社 Format file information extracting device and method
US20090119416A1 (en) * 2007-08-07 2009-05-07 Bridgegate Internationa, Llc Data transformation and exchange
US20100083092A1 (en) * 2008-09-30 2010-04-01 Apple Inc. Dynamic Schema Creation
WO2010088523A1 (en) * 2009-01-30 2010-08-05 Ab Initio Technology Llc Processing data using vector fields

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1400547A (en) * 2001-08-03 2003-03-05 富士通株式会社 Format file information extracting device and method
US20090119416A1 (en) * 2007-08-07 2009-05-07 Bridgegate Internationa, Llc Data transformation and exchange
US20100083092A1 (en) * 2008-09-30 2010-04-01 Apple Inc. Dynamic Schema Creation
WO2010088523A1 (en) * 2009-01-30 2010-08-05 Ab Initio Technology Llc Processing data using vector fields

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105283868A (en) * 2013-03-03 2016-01-27 微软技术许可有限责任公司 Probabilistic parsing
CN105283868B (en) * 2013-03-03 2018-07-10 微软技术许可有限责任公司 For the method for probability resolution, component, medium and system
CN105981006A (en) * 2014-02-14 2016-09-28 三星电子株式会社 Electronic device and method for extracting and using sematic entity in text message of electronic device
US10630619B2 (en) 2014-02-14 2020-04-21 Samsung Electronics Co., Ltd. Electronic device and method for extracting and using semantic entity in text message of electronic device
CN113966518A (en) * 2019-02-14 2022-01-21 欧司朗有限责任公司 Controlled agricultural system and method of managing agricultural system
CN113966518B (en) * 2019-02-14 2024-02-27 魔力生物工程公司 Controlled agricultural system and method of managing agricultural system
CN109948164A (en) * 2019-04-02 2019-06-28 北京三快在线科技有限公司 Processing method, device, computer equipment and the storage medium of statistical demand information

Also Published As

Publication number Publication date
CN102682065B (en) 2015-03-25

Similar Documents

Publication Publication Date Title
US8799234B2 (en) Semantic entity manipulation using input-output examples
Choi et al. Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases
CN101390088B (en) Xml payload specification for modeling EDI schemas
Serrano Neural networks in big data and Web search
CN101501688B (en) Methods and apparatuses for searching content
Zhang et al. Geospatial semantic web
Roman et al. The euBusinessGraph ontology: A lightweight ontology for harmonizing basic company information
Liu et al. A novel aspect-based sentiment analysis network model based on multilingual hierarchy in online social network
Li et al. HHMF: hidden hierarchical matrix factorization for recommender systems
CN102682065B (en) Semantic entity control using input and output sample
Iliadis et al. One schema to rule them all: How Schema. org models the world of search
Sharma et al. Semantic approach for Web service classification using machine learning and measures of semantic relatedness
Alvarez-Rodríguez et al. Empowering the access to public procurement opportunities by means of linking controlled vocabularies. A case study of Product Scheme Classifications in the European e-Procurement sector
Huang et al. Cross attention fusion for knowledge graph optimized recommendation
Nguyen et al. Attentional matrix factorization with document-context awareness and implicit API relationship for service recommendation
Quarteroni et al. A bottom-up, knowledge-aware approach to integrating and querying web data services
Yin et al. Chinese named entity recognition based on knowledge based question answering system
Tekli et al. Semantic to intelligent web era: building blocks, applications, and current trends
Khanam et al. A Web Service Discovery Scheme Based on Structural and Semantic Similarity.
CN111126073B (en) Semantic retrieval method and device
Singhal et al. An E‐commerce prediction system for product allocation to bridge the gap between cultural analytics and data science
Liu et al. An effective biomedical data migration tool from resource description framework to JSON
Zhang et al. An attentive memory network integrated with aspect dependency for document-level multi-aspect sentiment classification
Shafi et al. [WiP] Web Services Classification Using an Improved Text Mining Technique
Atta The effect of usability and information quality on decision support information system (DSS)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: MICROSOFT TECHNOLOGY LICENSING LLC

Free format text: FORMER OWNER: MICROSOFT CORP.

Effective date: 20150728

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20150728

Address after: Washington State

Patentee after: Micro soft technique license Co., Ltd

Address before: Washington State

Patentee before: Microsoft Corp.