WO2006003351A1 - Analyse de formats de donnees et leur traduction dans un format de donnees commun - Google Patents

Analyse de formats de donnees et leur traduction dans un format de donnees commun Download PDF

Info

Publication number
WO2006003351A1
WO2006003351A1 PCT/GB2004/002889 GB2004002889W WO2006003351A1 WO 2006003351 A1 WO2006003351 A1 WO 2006003351A1 GB 2004002889 W GB2004002889 W GB 2004002889W WO 2006003351 A1 WO2006003351 A1 WO 2006003351A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
format
elements
messages
segment
Prior art date
Application number
PCT/GB2004/002889
Other languages
English (en)
Inventor
Brian Bolam
Stephen Byrne
Original Assignee
Omprompt Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Omprompt Limited filed Critical Omprompt Limited
Priority to PCT/GB2004/002889 priority Critical patent/WO2006003351A1/fr
Publication of WO2006003351A1 publication Critical patent/WO2006003351A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Definitions

  • the present invention relates to a data interface and in particular a data interface capable of translating data input into it in a first format into output data in a second format.
  • a method for interfacing between an input data set in a first format and an output data set in a second format comprising the steps of: analysing the syntax of the input data set; separating the individual data fields in the input data set; extracting the data in each individual field; arranging the extracted data in accordance with the second format and thereby generating an output data set.
  • an interface having means for receiving an input data set, means for processing the inbound data set in order to generate an output data set and means for exporting the output data set.
  • Such an interface allows a system to deal with input data sets in a format that is not supported by the system by translating them into a format that is supported by the system.
  • the ontology for analysing the message syntax separating the ontology
  • IA/KM Intelligent Agent/Knowledge Management
  • the ontology for analysing the message syntax, separating the individual data fields and thus extracting data in each individual field is
  • the interface may be adapted to analyse and translate input data sets in any desired format.
  • the interface is additionally adapted to export an output data set in any desired format.
  • the interfacing method may 15. include the steps of translating the input data in the first format into a common format and then translating the data in the common format into the second format.
  • Particular formats that may be translated using such a system include but are not limited to EDI, XML, ASCII, SMS, Voice, VML, Fax and similar.
  • the particular formats mentioned herein include any related formats and particular variations upon 0 these formats.
  • EDI as defined herein covers a number of sub formats
  • XML as defined herein includes any such sub formats as may exist including but not limited to ASCI Xl 2 and UN/CEFACT.
  • the interface of the present invention may also be adapted to work with a number of database formats for instance to provide a means for databases in two different formats to be merged into a single database either in the same format as one of the original databases or in a third format.
  • the interface can be positioned between a user and the two databases in order that user queries can be translated by each database.
  • the interface is able to perform as described above, it is additionally operative to provide message mapping.
  • it is adapted so as to be able to receive an inbound message in any format and translate the inbound message into an output message in a desired format.
  • the interface is adapted to provide message switching; that is to receive an inbound message from a sender in a first format, translate the letter into a second format and forward the message to a desired recipient.
  • Messages received from one sender may always be sent to one or more particular recipients or may alternatively only be sent to recipients specified in the message.
  • Messages may be sent to recipients only in a particular specified format or may be sent in a one of a variety of specified formats depending on the data in the message or
  • the interface may further act to send messages to the sender confirming translation or receipt of the senders initial message.
  • the interface translates all inbound messages into a common format and then translates messages from the common format into the desired output message format.
  • the interface preferably stores copies of inbound messages either in their original format or in the common format or in both formats.
  • the method of interfacing additionally includes the step of verifying that the content of the input data set or message conforms to standard or expected criteria. If the message does not conform to the criteria an error message may be sent to the sender and or another authority. The error message may request that the sender or the authority check and or resend the message. The error message may additionally include information relating to why the message was rejected.
  • the interface may attempt to correct minor errors and if successful may notify the sender of the correction and or some other authority.
  • the interface may be required to await acceptance of the correction from the sender or other authority before forwarding the corrected message to the desired recipients.
  • records are generated and stored relating to each message received from a sender and forwarded to a recipient.
  • the records include some or all of the following: the format of the inbound message and or the outbound message, the time and date of receipt of a message, a log of any errors identified in the message, identification of the sender, identification of the recipient or recipients.
  • the new sender is required to take part in an automated registration process. They must provide a standard set of details for the interface and then send a message to the interface in their preferred format. The message can then be stored as a calibration standard to be used to aid future translation of messages from the sender.
  • the above described interface may be adapted to provide a messaging interface for a group of businesses.
  • the above may be used to form a messaging interface for the logistics industry.
  • an interface may be embodied by a plurality securely connected network service modules.
  • Each network service module may be a peer-to-peer service module comprising a hub linking two communications servers, a database server and a processing server.
  • a method of dete ⁇ r ⁇ iing the structure of messages in a particular format by analysing a plurality of such messages comprising the steps of: identifying individual segments within each message to thereby determine the segment structure of the particular message format; identifying individual elements within each segment, thereby determining the element structure of each segment; and analysing corresponding individual elements in each segment over all the analysed messages to determine the structure of each element in the particular data format.
  • the segment ID is used to identify the start and end positions of each segment.
  • the identified segments are counted so that the mean number of segments per message and the standard deviation in the number of segments per message can be determined.
  • the segment structure of each individual message is analysed to determine which segments appear in every message either once or more than once in every message and which segments appear once or more than once in only some messages and the number of messages with identical segment structures.
  • a user is provided with a list of all segments used in each message in the plurality of messages.
  • said list also contains statistical information relating to the number of times that the segment has occurred over all messages and information relating to the number of occurrences of a segment as a percentage of the number of messages.
  • the method may be applied to identifying elements in segments where the elements are delimited and/or to identifying elements in segments where the elements have a fixed length. If the segments to be analysed have delimited elements, preferably the element delimiters or separators may be used to identify the start and end positions of individual elements. In such cases, as each element can be readily identified, preferably only one segment at a time is analysed. If the segments to be analysed have fixed lengths, each like segment from the plurality of messages is analysed at the same time to identify the individual elements.
  • each segment may contain sub-elements, either above process may be used to identify the sub-element structure of each element as appropriate.
  • the data in each like segment is entered into a table or similar such that corresponding elements in each segment are entered into the same column of the table.
  • the elements are then analysed to determine the amount of element variation between segments.
  • said elements are classified in to a plurality of different classes by the amount of variation wherein variation is defined as the number of unique values that occur in a column as a percentage of the number of cells in the column.
  • the elements in the table are also analysed to determine whether they: are mandatory or optional; include date or time information; are alpha, alphanumeric or numeric; require a zero fill, if numeric; have a maximum or minimum length.
  • the method includes the step of providing a suggested map between the determined structure of the message and a common data format.
  • the common data format contains syntax and extensive descriptive/reference details.
  • the common data format may be extended to cope with new data segments, elements or formats.
  • the suggested map is generated by comparing each element in the particular format with each element in the common format. Preferably, this will provide a list of elements in the particular format that match elements in the common format and elements in the particular format that do not match elements in the common format. The elements that do not match may be matched to existing elements in the common format manually or may be matched to new elements created for the common format.
  • intelligent syntax generation may be used to compare a particular format against messages of other formats that have previously been mapped, and a map may be created by adapting a map generated for a previous format.
  • messages may be mapped automatically by analysing their overall ID, their segment ID and their element ID to find matches in a database of previous messages in previous formats.
  • the method may be further adapted to provide the steps of detecting variations in known message formats.
  • the detectable variations include: the addition of segments, elements or sub-elements; the omission of segments, elements or sub-elements; and the transposition of segments, elements or sub-elements.
  • the method may further include the step of cleansing the message. This may be achieved by: replacing data stored in an element; adding segments, elements or sub-elements; deleting segments, elements or sub-elements; or transposing segments, elements or sub-elements.
  • the third aspect of the present invention may be implemented in conjunction with the first and/or second aspects of the invention as desired or appropriate.
  • Figure 1 shows a network interface according to the present invention for interfacing messages between a number of truckers and a number of shippers;
  • Figure 2 shows a schematic block diagram of the interface of figure 1 ;
  • Figure 3 shows a flow diagram demonstrating the use of the interface of figure 1;
  • Figure 4 shows a list of segments occurring in a sample of messages
  • Figure 5 shows a list of standard messages built up from the segments listed in figure 4.
  • Figure 6 shows an example of a message having delimited elements
  • Figure 7 shows an example of a message having fixed length elements
  • Figure 8 shows steps in the analysis of a message segment having delimited elements
  • Figure 9 shows steps in the analysis of a message segment having fixed length elements
  • Figure 10 shows a segment of a message in the format analysed in figure 9 illustrating the volatility of the various elements in the segment;
  • Figure 11 shows how elements in a new format may be mapped to elements in a common format using declarative mapping
  • Figure 12 shows how elements in a new format may be mapped to elements in a common format using automatic mapping
  • Figure 13 shows examples of message statistics retrievable by a user.
  • an interface 100 is in use in the transport logistics industry.
  • the interface allows two shippers Sl, S2 to send messages formatted according to their preferences to any or all of a number of trackers Tl, T2, T3 and have the truckers Tl, T2, T3 receive the messages in their own preferred format.
  • the truckers Tl, T2, T3 can thus be coordinated such that which ever of them is available may come and collect a load from one of the shippers Sl, S2 as soon as the load has landed.
  • the interface translates incoming messages from the format they are received in to the format that the intended recipient desires to receive messages in.
  • the interface 100 in figure one is shown acting to translate messages in between five different formats (ANSI Xl 2,
  • the interface 100 is typically embodied by a network of securely connected network service modules 200, which may be peer-to-peer service modules.
  • Each peer-to-peer service module 200 typically comprises a hub 210 linking two or more communications servers 202, 204, a database server 206, and a processing server 208.
  • FIG. 2 a schematic block diagram of the interface 100 is shown.
  • Incoming messages 102 are held in a buffer 104 until sufficient processing space is available, once space is available the incoming message 104 is passed a translation engine 106.
  • the translation engine 106 translates the message using IA/KM technology in conjunction with stored protocols and formats 108 and stored knowledge 110 into a single common data format.
  • the translation is generated by analysing the message syntax in order to separate the individual data fields and then extract the data stored in each individual field in order to thus generate a message in the common format containing all the data of the incoming message 102.
  • the incoming message is stored in its original format in a historical message database 112 and in the common format in a common message database 116.
  • Each common format message is verified by a verification engine 114 to ensure that the data contained therein is in accordance with expected values. In the event that data inaccuracies are detected, the verification engine 114 may attempt to correct these by analyzing previous stored messages from the sender to obtain previous or expected values. If this is not possible, the verification engine 114 may request user intervention to correct the data. In the event of any detected data anomaly, the system will create a log which may be viewed by an authorized user and which may be used to generate outbound messages back to the sender of the incoming message 102.
  • the common format message is output to the outbound message generator 118.
  • the outbound message generator determines to which recipients the message is to be sent and looks up the recipients addresses in the recipients address database 122 and the recipients preferred message format in the recipients preferred formats database 120.
  • the outbound message generator then generates a correctly addressed outbound message 124 in the recipients preferred format which is then transmitted to the recipient. Typically the recipient will then confirm receipt of the message.
  • outbound messages 124 are only triggered when the data for the complete set has been received and verified.
  • the interface 100 will not generate outbound messages 124 on an incomplete data set, even though the subset of data required for that outbound message 124 has been received and verified.
  • the interface 100 automatically generates outbound messages 124 for known recipients based on their known preferences however, the interface 100 may generate messages in an alternative format and protocol for any user in the event that a receipt is not received for the first outbound message 124 or for any other requested reason. Once a receipt has been received for a complete set of outgoing messages 124, then the original incoming message may be removed from the local databases 112, 116 if desired.
  • the interface 100 sends an error notification to the sender.
  • the interface 100 will additionally check to see if there is a registered recipient with similar details and may suggest to the sender that they intended to send the message to this recipient instead. Alternatively if the recipient details are correct but the recipient is not yet registered then the e-mail may also contain a hot link to a registration
  • the data required for registration typically comprises the following:
  • Validation Code 1 4 A/N (e.g. 6950)
  • Validation Code 2 4 A/N (e.g. 1074)
  • Name/Description 40 A/N e.g. Fox International Transport
  • Interface Type 2 XML code linked to standing data
  • incoming messages 102 When incoming messages 102 arrive they are validated against these details by the validation engine 114 and then passed to the outbound message generator 118 along with data related to the intended destination of the message.
  • the interface 100 may additionally be provided with means for storing and cross-referencing all messages passed through the interface 100. In this manner, if an order message and a subsequent dispatch message are sent via the interface they can be cross-referenced and verified. A further notice of collection or receipt will also be cross-referenced by the interface 100.
  • the data stored by the interface may be stored in any suitable database format such as csv and managed by any suitable system such as SQL. Other formats or systems may however be substituted if desired.
  • Messages may be stored in a message log database if desired, which records all messages sent via the interface.
  • the contents of the message log may be exported to an external text file for analysis if desired. Additionally or alternatively reports of activity may be compiled and exported to suitable external software such as Microsoft Excel (registered trade mark).
  • this interface for communicating an update of the progress of a shipment being transported by a user is shown in figure 3.
  • this example relates to voice recognition over a telephone connection (cellular or landline) but other messages in other formats may alternatively be used.
  • the user must give an identification code. If the code is valid he progresses to s301 if the code is invalid he is prompted once more to give a valid identification code. At s301 the user is prompted to enter an order number if this is valid the user progresses to s302 wherein the interface can identify a matching order. If this number is not valid the user is prompted to try again.
  • the user may then enter an update of the position of the order and depending on the choice of update may be asked for supplementary information at s303. For instance if the order is late, s303c then the user is prompted to give an estimated time of arrival, if the order is delivered s303b, the user is asked to confirm the condition of the delivery (clear/short/damaged/refused). If however the order has just been collected, s303a a confirmation message to this effect is generated. The user then progresses to s304.
  • messages are translated into or out of a common format using Automated Message Profiling (AMP) to either identify the message as belonging to a known format and thus map the message to the common format or to determine that the message is not in a known format and generate a new map for translation the message into the common format.
  • AMP Automated Message Profiling
  • AUTOMATIC MESSAGE PROLFILING (AMP) identifies messages in known formats by looking at their message ID's. If the message ID is unknown, a new map must be generated.
  • a plurality of messages in the new format must be analysed in parallel.
  • a new format is detected because a new user wishes to use the interface and to do so the new user is requested to provide a sample of a plurality of messages for analysis.
  • automated message profiling operates by comparing a plurality of messages in the new format, said messages containing a wide variety of typical variations.
  • segment structure of the message format may be determined.
  • structure of individual fields, elements or records within each identifiable segment may be determined.
  • identification of the individual segments in a message format and the individual elements within each format can be done automatically and thus a map of a message format can be generated automatically. If the plurality of messages provided are insufficient to generate a complete map, then a user may access the data, and manually determine the segment structure within a message or the element structure within a segment. If the messages are unstructured, this method is of course ineffective.
  • AMP Automated message profiling
  • the format of a message is not recognised, in order to generate a map allowing the message to be mapped to the common format.
  • the structure can be compared to previous message formats that have been mapped by the interface. Where the message has a similar but non-identical structure to a previous message format, the map generated for the previous message format may be adapted to generate a map for the new message format. In this way the interface may become better at determining maps for new message formats over time. If the message has a completely unknown structure a completely new map must be generated. All messages which pass through the translation process will be held in a data repository at detail and summary level.
  • Information from the repository will be used as part of the mapping process where actual data from sample messages can be used to search how and where data of this type and value have been used before.
  • the analysis of the repository data is similar to data warehousing but with the subtle difference that nothing is known about the source data other than its value.
  • AMP Automatic message profiling
  • API Application Programming Interface
  • the first step in analysis of a plurality of messages to determine their structure and hence to provide a map from their format to a common format is to identify the segment structure of the messages and the individual segments in each message.
  • the first element of each segment contains a segment ID, these segment Ids may be used to identify the start and end positions of successive segments, particularly in a format wherein the segment length is fixed.
  • Message Count the number of messages within the sample.
  • the end of message can be identified by an empty segment or an end of message marker.
  • Average Segments Calculate the average number of segments per message. Standard Deviation of Segments. Calculate standard deviation for the number of segments across all messages. This will indicate the variation in the message set. The lower the number, the more standard the structure of the message. - Primary Required Segments. Count and identify each segment type which occurs only once in every message.
  • the next step is to analyse the above statistics to determine a definition of the message structure. This does not take account of message standards or syntax but does provide a view of the message structure.
  • This process generates a segment list and a standard message as detailed below. Segment List. All the segments used are listed in the correct sequence. This includes all segments that are optional. Each segment listed is listed alongside associated information from the statistical analysis above such as segment type (Primary Required), the number of occurrences and the number of occurrences as a percentage of all messages. This list indicates the largest message that could be created based on the sample data. A simple example of such a list is shown in figure 4.
  • Standard Message One message is selected from each message group that falls within each structure count determined above.
  • the messages are listed in descending order starting with a message from the group with the highest number of segments conforming to a common structure, followed by a message from the group containing the next highest number, etc.
  • Each message is listed alongside information obtained from the statistical analysis above such as the group count.
  • a typical example of such a list is shown in figure 5.
  • This provides an output which may be viewed by a user, if desired. The user may thus see information indicating the full range of variation within the sample set and providing statistical details indicating how often a structure occurred within the sample message set, the message count, how many other structures occurred and the related percentage. This information may be of use to the user, if any manual input in to the mapping process is required.
  • the user will thus be provided with an API having the parameters shown in table 2.
  • the next step in this method is to determine the structure of elements within each segment of a message.
  • element structure There are two types of element structure dealt with by AMP; Variable length elements (as shown in the message of figure 6), where each field is terminated by a special character (in figure 6 '*') called a field delimiter or a field separator and each element varies in size from message to message; Fixed length elements (as shown in the message of figure 7, (segments indented to illustrate structure and loops)), wherein there are no field delimiters but each element starts and ends in a specific position within the segment and is in the same place and has the same length in all segments of the same type.
  • two different methods of identifying and isolating the elements within a segment are used.
  • automated message profiling provides an API having the parameters shown in table 3.
  • Figure 8 a shows a typical field delimited segment taken from an ANSI X.12 (214) Shipment Status message providing shipment details such as weight, quantity, etc.
  • the objective is to isolate each element ready for analysis.
  • This segment is made up of eight elements, the first of which identifies the segment.
  • Figure 8b shows the segment parsed into individual elements.
  • the next step is to establish if there are any sub-elements using the same technique. In the example of figure 8, there are no sub-elements.
  • the next step is to isolate each of the segments from the AT8 segments in the rest of the messages to produce a table as shown in figure 8c. Each element can now be analysed to define the fundamental definition of the AT8 segment within the message set.
  • variable length messages In the case of fixed length elements, if no similar message structure has been analysed before, some manual input may be required to determine the message structure. As for variable length messages, the aim is to isolate each element within a segment. However, where a segment is highly populated, it can be difficult to separate elements that neighbour each other when there is little or no change from one message to another within the specific segment. Only one segment at a time is analysed throughout the sample message set before moving on to the next segment. An example of a segment with fixed length elements is shown in figure 9a.
  • the first element start was established from the interface data which stated the start position for the segment ID.
  • the second element starts where the segment ID ends. 3.
  • the third element starts where change has occurred prior to a static pattern.
  • the fourth and fifth elements start where white space has occurred in the same position across all sample messages for this segment.
  • Static Values Each column where the value is identical throughout is identified. Although these values may not be constants in the true sense, they can be treated as such. If this message were being defined for output, the static values can he set in place without being concerned with the actual mapping for those elements.
  • a particular example of this is to colour code red where static values exist, dark blue where one percent change occurs, mid-blue where five percent or less change occurs, light blue where ten percent changes occur. Columns where changes are greater than ten percent are viewed as volatile.
  • Date Definition Each cell within each column is analysed for date conformity. Dates may be held in a number of ways and each column must be examined to discover any date structure. If a date is discovered, and the date is present in all cells of the column, then the element is classified as a date element.
  • Time Definition Each cell within each column is analysed for time conformity. Times may be held in a number of ways and each column must be examined to discover any time structure. If a time is discovered, and the time is present in all cells of the column, then the element is classified as a time element. In certain formats, time can be an integral part of a date element, for example in a UN/EDIFACT DTM segment. If this is the case, the element will be defined using a function which applies the SEF structure for syntax.
  • Element Type Where an element has not been classified as a date or time, each cell within the column is analysed to determine whether the element is alpha, alphanumeric or numeric.
  • Each cell within the column is analysed to define the element maximum size.
  • the number of characters in the cell containing the shortest value is the maximum size.
  • the details are stored and may additionally be passed to another application or to a user for verification.
  • Providing the message details in this format allows a user to focus easily on the business process for which the message is being used rather than being concerned with redundant segments.
  • the results of the processing are stored in a Cache table having a row corresponding to each element of the analysed segment.
  • the parameters stored in the Cache table will typically be those shown in table 4.
  • a map for translating messages in the new format into a common format can be generated.
  • the common format must contain sufficient fields to contain all the information in any input message broken down into its logical sub-structure.
  • the common format may be set out in a table containing syntax and extensive descriptive/reference details. By grouping in this manner, a better view of the business process is provided which helps to indicate any further information that may be required in the mapping process.
  • a simplified example of a common format group is shown in table 5. In the case that a new format containing new information is encountered, a new group may be added to the common format. In this way the common format will grow over time.
  • a Cache knowledge database will be the main source of information for establishing mapping to the common format. Initially the mapping process may require manual intervention to complete the mapping but as more messages in new formats are processed, the knowledge database will contain more information eventually allowing full automation to occur. In order to access information from many different directions, the Cache database will incorporate OLAP (online analytical processing) systems. The design of the database is specific and targets fewer users to allow much larger amounts of data to be recovered. For instance, the knowledgebase structures put related data into physical proximity so that it can be accessed in the minimum number of reads and messages are stored in specific Cubes to allow any element to be summarised or specific values to be searched in order to obtain the message container.
  • OLAP online analytical processing
  • Declarative Mapping This is used in circumstances wherein a new message format is described but additional information is required at message level in order to align tb.e message with an associated industry and function.
  • the new format has been defined both physically and logically (assigned its place in the business process) it can then be simply mapped to the common format.
  • This process compares, at the syntax level, each element in the new format, with the elements in the common format. This results in the generation of an initial list for each element within the new message of those elements within the common
  • an ANSI(214) Ll 1 segment has three elements two of which are automatically mapped to elements in the common format on the right.
  • mapping is semi-automated and if a message has a simple structure and limited coding, this semi-automatic mapping will yield relatively high results, with up to 75% accuracy.
  • Intelligent Syntax Generation When sufficient historic user and common format messages have been created, intelligent syntax generation can be used as the first step in mapping a new format. In this process, sample messages of the new format are analysed and compared with the repository of historic messages of varying structure. Where the new format is similar to standard formats or previously mapped formats, creating the syntax for the new format will be relatively easy. Typically in such cases the variations between client standard formats relate only to the profile of the message for that client. For instance, one client may only use 20 segments per message but another client may use 100 segments. In these case wherein a similar message has been mapped before, mapping can be created automatically and simply verified. If however, the message includes new segments, then the message must be fully analysed.
  • Automatic Mapping This process involves cross-referencing with the knowledge database to determine if the message in its entirety, a single segment or a single element matching those in the new format can be found in the database. If a match is found, then the same mapping may be used. This process increases in efficiency, as more message format maps are stored in the database. There are three levels where cross-referencing with the knowledge database are required, message level, segment level and element level as described below.
  • the Message ED (e.g. ANSI/X.12/214/004010) can be used to check for existing messages of the same type, business function and industry as an existing format for another customer. Where a match is ' found, the mapping references on the matched format are duplicated for the new format.
  • the Message ID (e.g. ANSI/X.12/Segment) can be used to check for existing segments of the same type, business function ,and industry as an existing format for another customer.
  • the match may be from a different format to the new format but still within the same industry. Where a match is found, the mapping references on the matched format are duplicated for the new format.
  • the sample element may be matched to an element stored in the database to obtain a conclusive map to the common format.
  • the Cache database is organised for fast cross-referencing.
  • the sample containing the element to be matched is organised into rows and columns, each column representing one element of a segment.
  • the column is then sorted to so as to display the most common value for the element.
  • the most common value is then used to perform a saturation search for the value within the database.
  • the result of the search may yield many mapping suggestions which are sorted into order of most suggested to least suggested. This process is repeated for each element within the new format until every element has a result, either positive (mapping suggestions) or negative (no suggestions).
  • a pictorial example of this process is shown in figure 12.
  • a new ANSI(214) BlO segment has an element where a match has been found within the knowledge database.
  • the third element of the ANSI(214) BlO is a carrier code and its mapping reference is 'Logistics.ShipmentStatus. Carrier' .
  • the results of the mapping process are stored in a transient table for each specific message.
  • the transient table is provided with a row for each element of each segment of the record.
  • the parameters stored for each element are shown in table 7.
  • Information at summary and detail level on the message and the individual segments and elements can be selected and analysed from the Cache database.
  • This functionality can be achieved by using integrated SQL (structured query language) capabilities. This provides a user with the vital ability to search and summarise any data at any level.
  • the typical parameters for searching are set out in table 8.
  • the first step of the searching process is for the user to select a specific message format, by format ID (ANSI /X.12/214/004010) by customer or by any other searchable parameter.
  • a list of historic messages is then generated with status information such as date, time, etc.
  • the user can scroll up/down, left/right and select a specific message to examine. This process is illustrated by figures 13a and 13b.
  • the messages are displayed with their original statistical data.
  • the user may request a segment list or may select any segment from the segments shown and request a detailed analysis of the specific message. For example, if the user selected line 15 in figure 13a (the AT7 segment), all AT7 segments from the sample data would be displayed as shown in figure 13b. The order of display can then be sorted by any element within the segment. This allows these tasks to be performed automatically rather than by a user copying the sample data into a spreadsheet.
  • the results of any user analysis can be provided in a transient table detailing a minimum of a single segment. In such a table there will typically be provided a row for each element of each segment of the message. The parameters provided are shown in table 9.
  • the interface may also be adapted to undertake variation processing/error correction of messages in addition to translating messages.
  • This type of processing can be viewed as identifying and correcting an acceptable error.
  • the process works by identifying a variation from standard and if it is one of a particular set of known variation replacing it with a desired element, these variations are only process able only formats that have fully defined maps.
  • New segment The customer uses a segment that was not within the sample data set. New element. The customer has implemented a minor format change.
  • Missing element or segment The customer has implemented a minor format change. - Element or segment Transposition. The customer has implemented a minor format change.
  • the interface is required to have the message format defined in terms of at least the parameters listed in table 10.
  • the new segment can be automatically mapped as described previously.
  • a suggested mapping can be generated which may be confirmed by a user. In either case, if no suggested mapping can be generated, further sample messages can be requested for additional processing.
  • the new element is automatically detected based on syntax and contents. From analysis of previous messages which contain similar values, except for the variation, the segment and the details of the new element can either be mapped automatically or alternatively a suggested mapping can be generated which may be confirmed by a user. In either case, if no suggested mapping can be generated, further sample messages can be requested for additional processing.
  • Missing Element is automatically detected based on syntax and contents. From analysis of previous messages which contain similar values, except for the variation, the segment and the details of the missing element can either be mapped automatically or alternatively a suggested mapping can be generated which may be confirmed by a user. In either case, if no suggested mapping can be generated, further sample messages can be requested for additional processing. A similar process may be used in the case of a missing segment.
  • Transposed Elements are automatically detected based on syntax and contents. From analysis of previous messages which contain similar values, except for the variation, the segment and the details of the transposed elements can either be mapped automatically or alternatively a suggested mapping can be generated which may be confirmed by a user, hi either case, if no suggested mapping can be generated, further sample messages can be requested for additional processing. A similar process may be used in the case of transposed segments.
  • error or variations in the message data may be detected and corrected or cleansed during the translation process.
  • Textual data such as name, address, product description, etc., can easily be cleansed due to the low business impact (e.g. Changing 'Jhon' to John'). Cleansing quantity or value data requires more rigorous control, due to the potential impact (e.g. changing IO to 10 when the actual value was 1).
  • message cleansing will only work on fully defined formats with sample data for the original format.
  • the method of cleansing is varies depending on the type of data to be cleansed.
  • Numeric element may for instance require a range check whereas an alphanumeric element may require pattern matching.
  • Examples of the types of data error that may be cleansed include: missing data; partial data; out of range; transposition; table lookup; invalid data.
  • a number of parameters must be set for all elements where cleansing is required such as: replace missing data (yes/no); incomplete data (yes/no); range checking (yes/no); range start; range end; transposition (yes/no); table lookup; validate (yes/no).
  • cleansing is required such as: replace missing data (yes/no); incomplete data (yes/no); range checking (yes/no); range start; range end; transposition (yes/no); table lookup; validate (yes/no).
  • This method can also correct incomplete data provided only a small percentage of data is missing. This is achieved by using a percentage pattern match where the sorted fields are sorted and resorted in descending order to find a start or finish pattern and then allowing a 5% variation in match. This percentage could be increased if desired.
  • Range Checking This is effectively validation however if range checking and not validation is selected, calculations can be performed to reconstitute the value from other values where possible.
  • results of this process are provided in a transient table representing the specific message and an indicator to show success or fail. If successful, the message translation process can be completed. If failed, the message will be flagged for error analysis.
  • any techniques for generating maps for translating message formats from a new format to the common format described herein can also be applied to generating maps to translate messages in the common format into a new format.
  • any of the variation detection or error cleansing methods described above in relation to translation of messages into the common format may also be applied to translation of messages out of the common format.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne une interface (100) qui permet à des expéditeurs (SI, S2) d'envoyer des messages formatés selon la préférence collective ou individuelle de plusieurs conducteurs de camion (T1, T2, T3), et à ces derniers (T1, T2, T3) de recevoir les messages dans le format de leur préférence. Les conducteurs de camion (T1, T2, T3) peuvent alors être coordonnés de sorte que celui d'entre eux qui est disponible peut venir enlever un chargement proposé par l'un des expéditeurs (SI, S2) dès que ledit chargement est débarqué. Pour ce faire, l'interface traduit les messages entrants du format dans lequel ils sont reçus dans le format dans lequel le destinataire visé souhaite les recevoir.
PCT/GB2004/002889 2004-07-02 2004-07-02 Analyse de formats de donnees et leur traduction dans un format de donnees commun WO2006003351A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/GB2004/002889 WO2006003351A1 (fr) 2004-07-02 2004-07-02 Analyse de formats de donnees et leur traduction dans un format de donnees commun

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB2004/002889 WO2006003351A1 (fr) 2004-07-02 2004-07-02 Analyse de formats de donnees et leur traduction dans un format de donnees commun

Publications (1)

Publication Number Publication Date
WO2006003351A1 true WO2006003351A1 (fr) 2006-01-12

Family

ID=34957996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2004/002889 WO2006003351A1 (fr) 2004-07-02 2004-07-02 Analyse de formats de donnees et leur traduction dans un format de donnees commun

Country Status (1)

Country Link
WO (1) WO2006003351A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3605550A1 (fr) * 2009-03-04 2020-02-05 Masimo Corporation Système de surveillance médicale
US11087875B2 (en) 2009-03-04 2021-08-10 Masimo Corporation Medical monitoring system
US11133105B2 (en) 2009-03-04 2021-09-28 Masimo Corporation Medical monitoring system
US11176801B2 (en) 2011-08-19 2021-11-16 Masimo Corporation Health care sanitation monitoring system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5708828A (en) * 1995-05-25 1998-01-13 Reliant Data Systems System for converting data from input data environment using first format to output data environment using second format by executing the associations between their fields
US5715397A (en) * 1994-12-02 1998-02-03 Autoentry Online, Inc. System and method for data transfer and processing having intelligent selection of processing routing and advanced routing features
EP0863469A2 (fr) * 1997-02-10 1998-09-09 Nippon Telegraph And Telephone Corporation Méthode pour la génération automatique de définitions pour la conversion des dates pour logiciels d'analyse visuels à plusieurs dimensions
US5909570A (en) * 1993-12-28 1999-06-01 Webber; David R. R. Template mapping system for data translation
US6032147A (en) * 1996-04-24 2000-02-29 Linguateq, Inc. Method and apparatus for rationalizing different data formats in a data management system
WO2001046837A2 (fr) * 1999-12-21 2001-06-28 Datapower Technology, Inc. Procede et dispositif de transmission de donnees faisant appel a un generateur et a un transcodeur en temps reel
WO2003067432A1 (fr) * 2002-02-04 2003-08-14 Magenta Corporation Ltd Agent, procede et systeme informatique concus pour effectuer des negociations dans un environnement virtuel
US20040083199A1 (en) * 2002-08-07 2004-04-29 Govindugari Diwakar R. Method and architecture for data transformation, normalization, profiling, cleansing and validation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5909570A (en) * 1993-12-28 1999-06-01 Webber; David R. R. Template mapping system for data translation
US5715397A (en) * 1994-12-02 1998-02-03 Autoentry Online, Inc. System and method for data transfer and processing having intelligent selection of processing routing and advanced routing features
US5708828A (en) * 1995-05-25 1998-01-13 Reliant Data Systems System for converting data from input data environment using first format to output data environment using second format by executing the associations between their fields
US6032147A (en) * 1996-04-24 2000-02-29 Linguateq, Inc. Method and apparatus for rationalizing different data formats in a data management system
EP0863469A2 (fr) * 1997-02-10 1998-09-09 Nippon Telegraph And Telephone Corporation Méthode pour la génération automatique de définitions pour la conversion des dates pour logiciels d'analyse visuels à plusieurs dimensions
WO2001046837A2 (fr) * 1999-12-21 2001-06-28 Datapower Technology, Inc. Procede et dispositif de transmission de donnees faisant appel a un generateur et a un transcodeur en temps reel
WO2003067432A1 (fr) * 2002-02-04 2003-08-14 Magenta Corporation Ltd Agent, procede et systeme informatique concus pour effectuer des negociations dans un environnement virtuel
US20040083199A1 (en) * 2002-08-07 2004-04-29 Govindugari Diwakar R. Method and architecture for data transformation, normalization, profiling, cleansing and validation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ADELBERG B ET AL: "Nodose version 2.0", SIGMOD RECORD ACM USA, vol. 28, no. 2, 1 June 1999 (1999-06-01), pages 559 - 561, XP002314860, ISSN: 0163-5808 *
ADELBERG B: "NODOSE - A TOOL FOR SEMI-AUTOMATICALLY EXTRACTING STRUCTURED AND SEMISTRUCTURED DATA FROM TEXT DOCUMENTS", ACM PROCEEDINGS OF SIGMOD. INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, vol. 27, no. 2, 1998, pages 1 - 25, XP002949327 *
ANONYMOUS: "Data Profiling The Foundation for Data Management", INTERNET, 28 June 2004 (2004-06-28), pages 1 - 15, XP002313258 *
ANONYMOUS: "Data Profiling: The foundation for Data Management (cover page)", INTERNET, 28 June 2004 (2004-06-28), pages 1 - 2, XP002314858, Retrieved from the Internet <URL:http://itresearch.forbes.com/detail/RES/1077899820_54.html> [retrieved on 20050119] *
CHUANG-HUE MOH ET AL: "DTD-Miner: a tool for mining DTD from XML documents", PROCEEDINGS SECOND INTERNATIONAL WORKSHOP ON ADVANCED ISSUES OF E-COMMERCE AND WEB-BASED INFORMATION SYSTEMS. WECWIS 2000 IEEE COMPUT. SOC LOS ALAMITOS, CA, USA, 8 June 2000 (2000-06-08) - 9 June 2000 (2000-06-09), pages 144 - 151, XP002314859, ISBN: 0-7695-0610-0 *
RAHM E AND DO H-H: "Data Cleaning: Problems and Current Approaches", QUARTERLY BULLETIN OF THE COMPUTER SOCIETY OF THE IEEE TECHNICAL COMMITTEE ON DATA ENGINEERING, THE COMMITTEE, WASHINGTON, DC,, US, December 2000 (2000-12-01), pages 1 - 11, XP002284896, ISSN: 1053-1238 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3605550A1 (fr) * 2009-03-04 2020-02-05 Masimo Corporation Système de surveillance médicale
US11087875B2 (en) 2009-03-04 2021-08-10 Masimo Corporation Medical monitoring system
US11133105B2 (en) 2009-03-04 2021-09-28 Masimo Corporation Medical monitoring system
US11145408B2 (en) 2009-03-04 2021-10-12 Masimo Corporation Medical communication protocol translator
US11158421B2 (en) 2009-03-04 2021-10-26 Masimo Corporation Physiological parameter alarm delay
US11923080B2 (en) 2009-03-04 2024-03-05 Masimo Corporation Medical monitoring system
US11176801B2 (en) 2011-08-19 2021-11-16 Masimo Corporation Health care sanitation monitoring system
US11816973B2 (en) 2011-08-19 2023-11-14 Masimo Corporation Health care sanitation monitoring system

Similar Documents

Publication Publication Date Title
US8249744B2 (en) Mail routing system including a data block analyzer
US8131652B2 (en) Residential delivery indicator
US20130325739A1 (en) Systems and methods for tracking shipments
US9330371B2 (en) Method of processing documents relating to shipped articles
US20050067482A1 (en) System and method for data capture and management
US20020111820A1 (en) Transaction-based enterprise application integration ( EAI ) and development system
US8380797B2 (en) Business data exchange layer
CN104933544A (zh) 一种快递派件的自动化处理追踪方法及系统
CN116415206B (zh) 运营商多数据融合方法、系统、电子设备及计算机存储介质
US20030114955A1 (en) Method and system for processing return to sender mailpieces, notifying sender of addressee changes and charging sender for processing of return to sender mailpieces
CN100464876C (zh) 用于处理邮件的方法和装置
CN112001173A (zh) 派件编码管理方法、装置、终端设备和存储介质
WO2006003351A1 (fr) Analyse de formats de donnees et leur traduction dans un format de donnees commun
KR101930034B1 (ko) 데이터의 도메인을 판별하는 장치 및 그 방법
CN102402610A (zh) 一种日志自动分类通知的方法及系统
CN115170017A (zh) 运单处理方法、装置以及存储介质
KR101384409B1 (ko) 반송 우편물의 반송정보 수집 및 파일링 방법 및 시스템
US7248247B2 (en) Method and system for deterministic matching of objects and events which are not uniquely identified
GB2405062A (en) Protocol conversion via intermediate protocol with message storage and verification
CN116860909B (zh) 基于生化知识图谱的数据存储方法、系统及存储介质
CN117422348A (zh) 物流单计算错误的处理方法、装置、设备及存储介质
CN115801722A (zh) 企业电子邮箱获取方法、装置、电子设备及介质
CN116450098A (zh) 一种基于文旅行业的数字化云桌面系统的数字驾舱
CN117114879A (zh) 合规性判定的业务确定方法、装置、电子设备和存储介质
CN116910620A (zh) 一种数据异常检测方法、系统、设备和介质

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase