CN111047261A - Warehouse logistics order identification method and system - Google Patents

Warehouse logistics order identification method and system Download PDF

Info

Publication number
CN111047261A
CN111047261A CN201911269946.7A CN201911269946A CN111047261A CN 111047261 A CN111047261 A CN 111047261A CN 201911269946 A CN201911269946 A CN 201911269946A CN 111047261 A CN111047261 A CN 111047261A
Authority
CN
China
Prior art keywords
order
area
identified
format
characteristic value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911269946.7A
Other languages
Chinese (zh)
Other versions
CN111047261B (en
Inventor
陈小二
王营
高君凯
陈登虎
张秋萍
盛杨
周鑫
段志超
马海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Yingzhi Technology Co ltd
Original Assignee
Qingdao Yingzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Yingzhi Technology Co ltd filed Critical Qingdao Yingzhi Technology Co ltd
Priority to CN201911269946.7A priority Critical patent/CN111047261B/en
Publication of CN111047261A publication Critical patent/CN111047261A/en
Application granted granted Critical
Publication of CN111047261B publication Critical patent/CN111047261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/087Inventory or stock management, e.g. order filling, procurement or balancing against orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names

Abstract

The application discloses a warehouse logistics order identification method and system. The method comprises training a entrusted single-version classification model; inputting the warehouse logistics order to be identified into the order format classification model, and confirming the format type of the order to be identified: if the fixed layout is adopted, performing fixed area framing on the entrusted book to be identified, and identifying keywords from the framed fixed area; if the key characteristic value is in the unfixed format including the key characteristic value, dynamically performing area framing according to the relative position of the key characteristic value on the entrusting note to be identified, and identifying keywords from the framed dynamic area; if the non-fixed format does not include the key characteristic value, performing frame selection on the character blocking area according to a character blocking processing principle, and identifying a key word from the framed character blocking area; and carrying out coding comparison and data cleaning on the identified keywords according to the adaptation rules, and summarizing adaptation results. The workload and errors of manual input are reduced, and the automation of warehouse logistics management is realized.

Description

Warehouse logistics order identification method and system
Technical Field
The application relates to the technical field of warehousing management, in particular to a warehouse logistics order identification method and system.
Background
In the logistics transportation link, particularly in the international import and export logistics business, a large number of warehouse logistics entrusting orders are generated in the business process, however, as the types of air transportation and sea transportation receipts from all parts of the world are very many, no uniform format standard exists for the customer receipts in the industry, and common customers can fill in entrusting orders with various complicated formats according to the demands of the companies.
Logistics documents in tens to hundreds of formats are possible for smaller logistics companies; the clients of medium-sized and large-sized logistics companies can have tens of thousands of documents, the circulation of the tens of thousands of documents needs to be processed every day, the formats of the documents need to be processed are large, the recording work procedure is complicated, and a large amount of manpower is consumed. Meanwhile, manual processing is easy to make mistakes, and the error checking process is extremely difficult after the mistakes are made. Therefore, a method capable of automatically identifying warehouse logistics orders of various formats is needed, so that the complexity of manual processing is reduced, and the information input efficiency is improved.
Disclosure of Invention
The application provides a warehouse logistics order identification method, which comprises the following steps:
a large amount of warehouse logistics order are used in advance to train an order format classification model;
when a warehouse logistics order to be identified is obtained, inputting the warehouse logistics order to be identified into the order format classification model, and confirming the format type of the order to be identified:
if the type of the format to which the entrusting note to be identified belongs is determined to be a fixed format, performing fixed area frame selection on the entrusting book to be identified, and identifying keywords from the framed fixed area;
if the type of the format of the to-be-identified order is determined to be a non-fixed format comprising the key characteristic value, dynamically performing area framing according to the relative position of the key characteristic value on the to-be-identified order, and identifying keywords from a framed dynamic area;
if the format type of the order to be identified is determined to be a non-fixed format which does not include the key characteristic value, carrying out frame selection on the character block area according to a character block processing principle, and identifying the key words from the framed character block area;
and carrying out coding comparison and data cleaning on the identified keywords according to the adaptation rules, and summarizing adaptation results.
The warehouse logistics order recognition method comprises the following steps of using a large number of warehouse logistics order training order format classification models in advance, and specifically: the entrusted sheet format classification model capable of identifying various format types is input into a convolutional neural network for training in a mode of identifying entrusted book attachments in mails through a large number of pre-stored warehouse logistics entrusts and/or mailboxes.
The warehouse logistics order recognition method comprises the following steps of:
entrusting orders of various format types of a large number of clients are input into a convolutional neural network as input vectors for preprocessing;
extracting local layout features in the entrusting orders of various layout types, and summarizing to obtain multi-dimensional local layout features;
performing dimension reduction processing on the multi-dimensional local layout features, and further extracting the layout features in various entrustment orders;
and classifying the format characteristics in each type of entrusting order to obtain entrusting order format classification models for identifying various format types.
In the warehouse logistics order identification method, if the type of the format of the order to be identified is determined to be the fixed format, the preset fixed area corresponding to the order is adopted for framing, and keywords are identified from the framed fixed area.
The warehouse logistics order identification method as described above, wherein if it is determined that the type of the associated format of the order to be identified is a non-fixed format including a key feature value, the following operations are performed:
step S1, acquiring a first key characteristic value in the order as a current key characteristic value;
step S2, identifying the next characteristic value of the current key characteristic value in the order;
step S3, taking the area between the current key characteristic value and the next characteristic value as the frame selection area of the current key characteristic value;
and step S4, recognizing the keywords from the frame area, taking the next characteristic value of the current key characteristic value as the current key characteristic value, and returning to execute step S2.
The warehouse logistics order identification method as described above, wherein the step S3 specifically includes the following sub-steps:
determining the upper edge and the lower edge of the rectangular frame selection area according to the current key characteristic value and the next characteristic value in the downward direction;
the position of the current key characteristic value is deviated to the left by a preset displacement and is used as the left edge of the rectangular frame selection area;
and the position of the next characteristic value of the current key characteristic value in the right direction is used as the right edge of the rectangular frame selection area, and the position is shifted leftwards by the appointed displacement.
The warehouse logistics order identification method as described above, wherein if it is determined that the format type of the order to be identified is a non-fixed format that does not include a key feature value, acquiring a current line region from the order, identifying four adjacent directions of the current line region, and determining a rectangular edge position of a key paragraph region specifically includes:
upper edge: identifying from the head line area, wherein the upper edge of the rectangle selected by the key paragraph frame is the upper edge of the head line area; after a certain key paragraph is identified, the upper edge of the next key paragraph frame selection is the lower edge of the previous key paragraph frame selection;
lower edge: acquiring a next line of the current line region in a downward direction, determining the distance between the next line and the current line, and if the distance exceeds a preset distance, determining the lower edge of a rectangle framed and selected by a key paragraph as the lower edge of the current line region; if the distance is within the preset distance, taking the next line of the current line area as the current line area, and continuously obtaining downwards until the line spacing exceeds the preset distance to determine the lower edge of the rectangle framed by the key paragraph;
left edge: if the framed rectangular area does not exist in the areas of the upper edge and the lower edge of the rectangle, the position of the leftmost character in the area is deviated to the left for appointed displacement to be used as the left edge of the rectangular framing area, and if the framed rectangular area exists, the right edge of the rectangular area selected in the previous frame is used as the left edge of the rectangular framing area;
right edge: detecting the space of each character in each line in the upper edge and the lower edge of the rectangular frame selection area, and if the line with the minimum exceeding value and the character space exceeding the preset width exists, using the position of the character before the line exceeds the preset width as the right edge of the rectangular frame selection area after being shifted to the right; if no character exists on the right side, the position of the rightmost character is directly shifted to the right by the appointed displacement position to be used as the right edge of the rectangular frame selection area.
The warehouse logistics order identification method comprises the following steps of:
carrying out code comparison on the identified specific key words, and converting the specific key words into unique code identifiers;
and carrying out data cleaning on the identified keywords which accord with the regular expression, and extracting numerical values from the keywords.
The application also provides a warehouse logistics order identification system which comprises a warehouse logistics order identification subsystem, an email extraction order subsystem and an adaptation result management subsystem;
the warehouse logistics order identification subsystem executes any one of the warehouse logistics order identification methods;
the mail extraction order subsystem is used for automatically receiving mails, automatically extracting an order attachment to be identified from the mails and inputting the order to be identified into the warehouse logistics order identification subsystem for identification processing;
the adaptation result management subsystem is used for storing the adaptation result into a database or interfacing an API (application programming interface) provided by a customer service system and directly inputting the adaptation result into the service system.
The warehouse logistics order recognition subsystem is further used for inputting the order form from the mail extraction order form subsystem into the convolutional neural network again to train the order form type classification model.
The beneficial effect that this application realized is as follows:
(1) by adopting the warehouse logistics order identification method and system provided by the application, the order with the standard format can be identified, and various warehouse logistics orders without standard specifications can be identified;
(2) the keywords in the identified entrustment orders with different formats can be accurately identified by adopting different processing methods, so that the accuracy of extracting the key information is improved;
(3) according to the method, a series of operations including collection of mails from the order, format recognition, key information extraction and key information collection management are all automated, so that the tedious workload of manual entry is reduced, the errors of manual entry are reduced, the efficiency of processing a large number of orders is improved, and the automation of warehouse logistics management is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of a warehouse logistics order identification method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating the detailed operation of training the entrusting order layout classification model;
FIG. 3 is an exemplary diagram of a fixed-layout type order;
FIG. 4 is an exemplary diagram of a delegation order for a non-fixed layout type that includes key feature values;
FIG. 5 is a diagram of an example order for a non-fixed layout type that does not include key feature values;
fig. 6 is a schematic view of a warehouse logistics order identification system according to the second embodiment of the present application.
Detailed Description
Example one
In the existing logistics transportation business, the types of warehouse logistics order are very many, and the documents provided by customers do not have a uniform standard format, so that each family fills the order with a complex format according to own writing habits and requirements. The applicant of the present application finds that although the orders have different formats, certain rules can be followed, for example, the orders of some clients have a fixed format, some clients have an unfixed format but can write common information such as sender information, recipient information, port information, cargo information and the like in a general order, and some clients have no common keywords in the orders but have writing specifications basically similar to each other.
Therefore, for a large number of existing warehouse logistics orders in different formats, an embodiment of the present application provides a warehouse logistics order identification method, as shown in fig. 1, including the following steps:
step 110, using a large amount of warehouse logistics order training order format classification models in advance based on a convolutional neural network;
in the embodiment of the application, a large number of warehouse logistics entrustment orders stored in advance and/or entrustment book attachments in mails identified from mailboxes are input into a convolutional neural network to train entrustment order format classification models capable of identifying various format types; in addition, when a new client order is received through mails and the like, the order is input into the convolutional neural network for further training so as to improve the accuracy of the classification model;
specifically, training the entrusting order format classification model, as shown in fig. 2, specifically includes the following sub-steps:
step 210, inputting entrusting orders of various format types of a large number of clients as input vectors into a convolutional neural network for preprocessing;
the method and the device have the advantages that the unique name/identification of the client and various entrustment layout types of the client are used as the characteristics to be trained of the convolutional neural network, and different layout types of different clients can be identified through the training of the convolutional neural network.
Step 220, extracting local layout features in the order of various layout types, and summarizing to obtain multi-dimensional local layout features;
the layout features include, but are not limited to, character direction gradient histogram features, inter-line distribution features, inter-line character features, and the like;
in the embodiment of the application, a vector matrix D is constructed by information in a consignment order, and then a one-dimensional convolution kernel w epsilon R is utilized on a convolution layer of a convolution neural networka*hExtracting the characteristics of the vector matrix D to obtain a characteristic value CnWhere a represents the dimension of the vector and h represents the size of the one-dimensional convolution kernel window;
specifically, the layout features are extracted in the convolutional layer using the following formula:
Figure BDA0002313884190000051
wherein n represents the number of convolution operations, m represents the number of convolution kernels, h represents the window size of one-dimensional convolution kernels, n + h-1 represents n to n + h-1, f (·) represents a nonlinear activation function,. represents the sharing weight of the convolution kernels and the corresponding operations of the vector matrix, x represents the input value of the vector matrix, w represents the weight, and b represents the deviation value.
230, performing dimension reduction processing on the multi-dimensional local layout features, and further extracting the layout features in various entrustment orders;
in the embodiment of the application, the layout features extracted from the convolutional layer are input into a pooling layer of the convolutional neural network, the pooling layer is used for further extracting the features, the maximum value of feature mapping is taken as the most important feature to be extracted, so that the layout features in the entrusting order are obtained, and a one-dimensional vector is obtained after all feature mapping is subjected to pooling and dimension reduction.
Specifically, the characteristic value is further extracted in the pooling layer by the following formula:
pv=max[Cn](formula 2)
Wherein n represents the number of convolution operations; by pooling the layer of sampling, the features obtained by convolution are further classified, preventing over-fitting and enhancing the robustness of the structure.
Step 240, classifying the layout characteristics in each type of entrusting order to obtain entrusting order layout classification models for identifying various layout types;
in the embodiment of the application, the format characteristics of various entrusting orders output by a pooling layer are input into a full-connection layer of a convolutional neural network for characteristic classification, a loss function is introduced to improve the prediction accuracy and the use efficiency of the model, and entrusting order format classification models for identifying various format types are output;
optionally, the classification model of this embodiment is trained and recognized in a manner of recognizing a client to which the order belongs first, and then recognizing a type of a version of the order, where the recognized version of the order is roughly divided into two categories, namely a fixed version type and a non-fixed version type, and the fixed version type and the non-fixed version type both include multiple types of sub-types of versions;
the fixed format type refers to an order with a fixed key characteristic value position and a fixed format, and comprises a plurality of fixed format subclasses; as shown in FIG. 3, FIG. 3 is a fixed-layout type order of a client, from which it can be seen that the bolded fonts such as Shipper, Consignee receiver, Notify Party notifier, Place of Receipt, Port of Loading, Place of Delivery, etc. are all key feature values in the order, and the layout of such order of the client is fixed.
The non-fixed layout types can be divided into two types, namely, the non-fixed layout types including key characteristic values and the non-fixed layout types not including the key characteristic values;
FIG. 4 is a non-fixed layout type OF a client including key feature values, as shown in FIG. 4, although FIG. 4 does not have a fixed format like the principal OF FIG. 3, the principal OF FIG. 4 includes key feature values like SHIPPER, CNEE, NOTIFY PARTY, PORT OF LOADING, DESCRIPTION, etc.;
FIG. 5 is a non-fixed layout type of a client that does not include key feature values, as shown in FIG. 5. although FIG. 5 does not have a fixed format like the Power of FIG. 3 nor key feature values like the Power of FIG. 4, the key paragraphs in the Power of FIG. 5 are differentiated at larger intervals and the meaning of each key paragraph is known from the daily Power form;
in addition, it should be noted that the key feature value in the order is not a fixed character, and the key feature values of different descriptions are also trained during model training, for example, characters such as the key feature value Consignee or CNEE are used as the key feature value meaning "Consignee" after model training; and the new key characteristic value appearing in the new order can be trained into the recognizable key characteristic value after being input into the convolutional neural network.
Referring back to fig. 1, in step 120, when the warehouse logistics order to be identified is obtained, inputting the warehouse logistics order to be identified into the order format classification model, and confirming the format type of the order to be identified:
specifically, after the warehouse logistics order to be identified inputs the order format classification model, the client to which the order belongs and the type of the format to which the client belongs are identified, which specifically includes the following situations:
(1) if the type of the format of the entrustment bill to be identified is determined to be a fixed format, performing fixed area frame selection on the entrustment book to be identified, identifying keywords from the framed fixed area, and executing step 130;
for the order with the determined fixed format type, performing framing by adopting a preset fixed area corresponding to the order, identifying keywords in the framed fixed area, for example, identifying the order of the A-1 type fixed format of the A client, performing key paragraph framing by adopting the fixed area preset for the type of format of the A client, and then identifying the keywords in the framed area by using an OCR technology; for the fixed-format order as shown in fig. 3, the key paragraphs are framed according to the preset rectangle with a length and a width, that is, the dashed boxes on the graph, and then the internal keywords are identified from each dashed box. It should be noted that, for different fixed formats of different identified clients, rectangular frames with different lengths and widths are preset for key paragraph framing.
(2) If the type of the format of the to-be-identified order is determined to be a non-fixed format including the key characteristic value, dynamically performing area framing according to the relative position of the key characteristic value on the to-be-identified order, identifying keywords from a framed dynamic area, and executing step 130;
specifically, the key feature values include but are not limited to common keywords of the consignment note, such as a sender, a receiver, port information, cargo information and the like; when the entrusting order form type is identified to be a non-fixed form comprising the key characteristic value, the following operations are carried out:
step S1, acquiring a first key characteristic value in the order as a current key characteristic value by using an OCR technology;
step S2, continuously utilizing the OCR technology to identify the next characteristic value of the current key characteristic value in the order;
wherein the next feature value of the current key feature value includes a next feature value in a downward direction of the current key feature value and a next feature value in a rightward direction of the current key feature value.
Step S3, taking the area between the current key characteristic value and the next characteristic value as the frame selection area of the current key characteristic value;
specifically, the upper and lower edges of the rectangular frame selection area are determined according to the current key characteristic value and the next characteristic value in the downward direction; then determining the left and right edges of the rectangular frame selection area: according to the writing habit, when the upper line and the lower line have an incidence relation, the left side position difference of the lines generally does not exceed a preset value (such as 50mm), the position of the current key characteristic value is used as the left edge of the rectangular frame selection area, and the position of the next characteristic value of the current key characteristic value in the right direction is used as the right edge of the rectangular frame selection area;
optionally, in order to prevent the selected area from being unable to frame all characters, it is preferable that a position where the current key feature value is shifted leftward by a predetermined shift (the predetermined shift may be set to any data of 0 to 50mm according to actual needs) is used as a left edge of the rectangular selected area, and a position where a next feature value in the rightward direction of the current key feature value is located is shifted leftward by an agreed shift is used as a right edge of the rectangular selected area.
Step S4, recognizing the keywords from the frame selection area, taking the next characteristic value of the current key characteristic value as the current key characteristic value, and returning to execute the step S2;
for example, in the non-fixed format including the key feature value shown in fig. 4, a first key feature value ship is identified first, then a next key feature value CNEE is identified in sequence, then it can be determined that the upper and lower edges of a rectangular frame where the key paragraph of the SHIPPER is located are between ship and CNEE (as shown by a dotted line in the figure), then a position where the current key feature value ship is shifted leftward by a predetermined displacement is used as the left edge of the rectangular frame selection area, then it is determined that the next key feature value of the current key feature value ship in the right direction is a DEMAND, and a position where the DEMAND is located is shifted leftward by a predetermined displacement is used as the right edge of the rectangular frame selection area; in fig. 4, the other key feature values are determined by the rectangular frame selection area in the same manner as described above, which is not shown in the figure and is not described herein again.
(3) If the format type of the order to be identified is determined to be a non-fixed format which does not include the key characteristic value, the character blocking area is subjected to frame selection according to a character blocking processing principle, keywords are identified from the framed character blocking area, and step 130 is executed;
it is found that in general, when a client writes a consignment bill without a key feature value, the meanings of the paragraph representations on the consignment bill are set according to habits, for example, the first paragraph is consignor information, the second paragraph is consignee information, and the third paragraph is port information …, so that when a large number of consignment bills are input into a convolutional neural network for training and learning, the actual meanings of the position representations of the paragraphs can be identified for the consignment bills of non-fixed formats which do not include the key feature value.
Specifically, for a delegation bill without a key feature value, acquiring a current line region from the delegation bill (a first line region of the delegation bill is identified as the current line region, a key paragraph is identified and then a first line of a next paragraph is identified as the current line region), then identifying four adjacent directions of the current line region, and determining a rectangular edge position of the key paragraph region, specifically including:
upper edge: identifying from the head line area, wherein the upper edge of the rectangle selected by the key paragraph frame is the upper edge of the head line area; after a certain key paragraph is identified, the upper edge of the next key paragraph framing is the lower edge of the previous key paragraph framing rectangle;
lower edge: acquiring a next line of the current line region in a downward direction, determining the distance between the next line and the current line, and if the distance exceeds a preset distance, determining the lower edge of a rectangle framed and selected by a key paragraph as the lower edge of the current line region; if the distance is within the preset distance, taking the next line of the current line area as the current line area, continuously returning and then downwards acquiring until the line spacing exceeds the preset distance, and determining the lower edge of the rectangle framed by the key paragraph;
left edge: according to writing habits, when a previous line and a next line have an association relationship, the position difference of the left side of the line generally does not exceed a preset value (such as 100mm), if no framed rectangular area exists in the areas of the upper edge and the lower edge of the rectangle, the position of the leftmost character in the area is deviated to the left for appointed displacement to serve as the left edge of the rectangular framing area, and if the framed rectangular area exists, the right edge of the rectangular area of the previous framing is taken as the left edge of the rectangular framing area;
right edge: detecting the space of each character in each line in the upper edge and the lower edge of the rectangular frame selection area, if the space of the characters exceeds the preset width and the line with the minimum exceeding value is found from the lines exceeding the preset width, shifting the position of the character before the line exceeds the preset width to the right by about the position after the positioning shift to be used as the right edge of the rectangular frame selection area; if no character exists on the right side, the position of the rightmost character is directly shifted to the right by the appointed displacement position to be used as the right edge of the rectangular frame selection area.
For example, in the non-fixed layout shown in fig. 5, which does not include the key feature value, the first row "E company" is identified first, and then the row "F company" with the row spacing larger than the predetermined distance is identified downward, so that the upper edge of the rectangular frame is determined to be above the row "E company" and the lower edge is determined to be above the row "F company" (as shown by the dotted line in the figure); then, the rectangular area which is already framed and selected does not exist in the area, so that the left edge of the rectangular area which is framed and selected at this time is determined as the left edge of the rectangular area which is determined as the position where the leftmost character Q of the area is located and is shifted leftwards by an appointed displacement; then, the right edge of the rectangular frame is determined, namely, a line with the minimum exceeding value and the character distance of each line exceeding the preset distance in the area, namely the line "SHANDONG, CHINA", is found, and then the position of the last character "a" in the line is shifted to the right by about the position after the shift to be used as the right edge of the rectangular frame selection area. After the rectangular frame (referred to as the first rectangular frame herein) is determined, the left edge of the rectangle (referred to as the second rectangular frame herein) determined by the right region is the right edge of the first rectangular frame because the first rectangular frame is detected to exist before, and the right side of the second rectangular frame has no characters so that the position of the rightmost character "port" is directly shifted to the right by the appointed displacement position as the right edge of the rectangular frame selection region. The other regions in fig. 5 determine the rectangular frame selection region in the same manner as described above, which is not shown in the figure and is not described herein again.
In addition, besides the above processing of the format, the application also needs to perform special identification on other important information appearing in the order, for example, important information such as 'FREIGHTPREPAID (freight prepayment)' and 'not displayed on the bill of lading' in the order is identified, and when the order is identified to have customer special requirements, special business processing is required according to the customer special requirements. Additionally for similarities identified in the order ": "or the like, which has an explanatory meaning, may be stored in association with information before and after the character.
Referring back to fig. 1, step 130, performing code comparison and data cleaning on the identified keywords according to the adaptation rules, and summarizing adaptation results;
optionally, since the consignment note described in the present application includes a special vocabulary similar to the port name when the consignment note is applied to port freight, and the port name may be different in text description, after identifying the keyword similar to the port name, the keyword needs to be encoded and compared, and the keyword is converted into a unique encoded identifier;
in addition, the cargo information in the order sheet generally includes information such as box type, box amount, box weight, etc., for example, 1 × 40RH 50kg, which represents 1 freezer with 40 feet and 50kg, and such information is usually required to be subjected to data cleaning according to an adaptation rule, and the values in the information are extracted from a regular expression ". multidot./d + X \ d {2}' [ a-Z ] {2 }" corresponding to the cargo information, so that the box amount is 1, the box type is 40HQ, and the box weight (unit kg) is 50;
for the adaptation result, the application can perform data distribution according to the actual application scene, for example, the summary result can be stored in a database, or the application can also interface an API (application programming interface) provided by a customer service system and directly input the adaptation result into the service system.
Example two
The second embodiment of the present application provides a warehouse logistics order identification system, as shown in fig. 6, including a warehouse logistics order identification subsystem 610, an email extraction order subsystem 620, and an adaptation result management subsystem 630;
the warehouse logistics order identification subsystem 610 executes the warehouse logistics order identification method in the first embodiment;
the mail extraction order subsystem 620 is used for automatically receiving mails, automatically extracting an order attachment to be identified from the mails, and inputting the order to be identified into the warehouse logistics order identification subsystem for identification processing;
the adaptation result management subsystem 630 is used for storing the adaptation result in a database, or interfacing an API interface provided by the customer service system, and directly inputting the adaptation result into the service system.
Specifically, the warehouse logistics order identification subsystem 610 specifically includes:
a consignment form classification model training module 611, configured to use a large amount of warehouse logistics consignment forms in advance to train a consignment form classification model;
the order form identification processing module 612 is configured to, when the warehouse logistics order to be identified is obtained, input the warehouse logistics order to be identified into the order form classification model, and confirm the type of the associated form of the order to be identified:
if the type of the format to which the entrusting note to be identified belongs is determined to be a fixed format, performing fixed area frame selection on the entrusting book to be identified, and identifying keywords from the framed fixed area;
if the type of the format of the to-be-identified order is determined to be a non-fixed format comprising the key characteristic value, dynamically performing area framing according to the relative position of the key characteristic value on the to-be-identified order, and identifying keywords from a framed dynamic area;
if the format type of the order to be identified is determined to be a non-fixed format which does not include the key characteristic value, carrying out frame selection on the character block area according to a character block processing principle, and identifying the key words from the framed character block area;
the adaptation result management module 613 is configured to perform coding comparison and data cleaning on the identified keywords according to the adaptation rules, and summarize adaptation results.
Further, the mail extraction order subsystem 620 specifically identifies orders from mails as automatic mail collection, then automatically identifies mails with characters such as 'order book' and 'order', and then searches attachments from the mails, extracts orders from the mails, and then sends the extracted orders to the warehouse logistics order identification subsystem 610 for training and identification;
correspondingly, the warehouse logistics order identification subsystem 610 is further configured to, when identifying and determining that the order in the mail is a legal order, re-input the order from the mail extraction order subsystem into the convolutional neural network (i.e., the order-type classification model training module 611), train the order-type classification model.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application. Although the identification method and system of the order are disclosed in the present application, other logistics documents with different formats can be identified by the identification method of the present application, and it is obvious that various changes and modifications can be made to the present application by those skilled in the art without departing from the spirit and scope of the present application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A warehouse logistics order identification method is characterized by comprising the following steps:
a large amount of warehouse logistics order are used in advance to train an order format classification model;
when a warehouse logistics order to be identified is obtained, inputting the warehouse logistics order to be identified into the order format classification model, and confirming the format type of the order to be identified:
if the type of the format to which the entrusting note to be identified belongs is determined to be a fixed format, performing fixed area frame selection on the entrusting book to be identified, and identifying keywords from the framed fixed area;
if the type of the format of the to-be-identified order is determined to be a non-fixed format comprising the key characteristic value, dynamically performing area framing according to the relative position of the key characteristic value on the to-be-identified order, and identifying keywords from a framed dynamic area;
if the format type of the order to be identified is determined to be a non-fixed format which does not include the key characteristic value, carrying out frame selection on the character block area according to a character block processing principle, and identifying the key words from the framed character block area;
and carrying out coding comparison and data cleaning on the identified keywords according to the adaptation rules, and summarizing adaptation results.
2. The warehouse logistics order recognition method as claimed in claim 1, wherein a large amount of warehouse logistics orders are used in advance to train the order format classification model, specifically: the entrusted sheet format classification model capable of identifying various format types is input into a convolutional neural network for training in a mode of identifying entrusted book attachments in mails through a large number of pre-stored warehouse logistics entrusts and/or mailboxes.
3. The warehouse logistics order recognition method as claimed in claim 1 or 2, wherein the training of the order-style classification model specifically comprises the following substeps:
entrusting orders of various format types of a large number of clients are input into a convolutional neural network as input vectors for preprocessing;
extracting local layout features in the entrusting orders of various layout types, and summarizing to obtain multi-dimensional local layout features;
performing dimension reduction processing on the multi-dimensional local layout features, and further extracting the layout features in various entrustment orders;
and classifying the format characteristics in each type of entrusting order to obtain entrusting order format classification models for identifying various format types.
4. The warehouse logistics order identification method as claimed in claim 1, wherein if the type of the format of the order to be identified is determined to be a fixed format, a preset fixed area corresponding to the order is adopted for frame selection, and keywords are identified from the framed fixed area.
5. The method for identifying the warehouse logistics order as claimed in claim 1, wherein if the type of the associated format of the order to be identified is determined to be a non-fixed format including the key characteristic value, the following operations are performed:
step S1, acquiring a first key characteristic value in the order as a current key characteristic value;
step S2, identifying the next characteristic value of the current key characteristic value in the order;
step S3, taking the area between the current key characteristic value and the next characteristic value as the frame selection area of the current key characteristic value;
and step S4, recognizing the keywords from the frame area, taking the next characteristic value of the current key characteristic value as the current key characteristic value, and returning to execute step S2.
6. The warehouse logistics order identification method as claimed in claim 5, wherein the step S3 comprises the following sub-steps:
determining the upper edge and the lower edge of the rectangular frame selection area according to the current key characteristic value and the next characteristic value in the downward direction;
the position of the current key characteristic value is deviated to the left by a preset displacement and is used as the left edge of the rectangular frame selection area;
and the position of the next characteristic value of the current key characteristic value in the right direction is used as the right edge of the rectangular frame selection area, and the position is shifted leftwards by the appointed displacement.
7. The warehouse logistics order identification method of claim 1, wherein if the format type of the order to be identified is determined to be a non-fixed format that does not include a key feature value, acquiring a current line area from the order, identifying four adjacent directions of the current line area, and determining a rectangular edge position of a key paragraph area, specifically comprises:
upper edge: identifying from the head line area, wherein the upper edge of the rectangle selected by the key paragraph frame is the upper edge of the head line area; after a certain key paragraph is identified, the upper edge of the next key paragraph frame selection is the lower edge of the previous key paragraph frame selection;
lower edge: acquiring a next line of the current line region in a downward direction, determining the distance between the next line and the current line, and if the distance exceeds a preset distance, determining the lower edge of a rectangle framed and selected by a key paragraph as the lower edge of the current line region; if the distance is within the preset distance, taking the next line of the current line area as the current line area, and continuously obtaining downwards until the line spacing exceeds the preset distance to determine the lower edge of the rectangle framed by the key paragraph;
left edge: if the framed rectangular area does not exist in the areas of the upper edge and the lower edge of the rectangle, the position of the leftmost character in the area is deviated to the left for appointed displacement to be used as the left edge of the rectangular framing area, and if the framed rectangular area exists, the right edge of the rectangular area selected in the previous frame is used as the left edge of the rectangular framing area;
right edge: detecting the space of each character in each line in the upper edge and the lower edge of the rectangular frame selection area, and if the line with the minimum exceeding value and the character space exceeding the preset width exists, using the position of the character before the line exceeds the preset width as the right edge of the rectangular frame selection area after being shifted to the right; if no character exists on the right side, the position of the rightmost character is directly shifted to the right by the appointed displacement position to be used as the right edge of the rectangular frame selection area.
8. The warehouse logistics order identification method as claimed in claim 1, wherein the identified keywords are subjected to code comparison and data cleaning according to an adaptation rule, and the method specifically comprises the following substeps:
carrying out code comparison on the identified specific key words, and converting the specific key words into unique code identifiers;
and carrying out data cleaning on the identified keywords which accord with the regular expression, and extracting numerical values from the keywords.
9. A warehouse logistics order identification system is characterized by comprising a warehouse logistics order identification subsystem, an email extraction order subsystem and an adaptation result management subsystem;
the warehouse logistics order identification subsystem executes the warehouse logistics order identification method according to any one of claims 1-8;
the mail extraction order subsystem is used for automatically receiving mails, automatically extracting an order attachment to be identified from the mails and inputting the order to be identified into the warehouse logistics order identification subsystem for identification processing;
the adaptation result management subsystem is used for storing the adaptation result into a database or interfacing an API (application programming interface) provided by a customer service system and directly inputting the adaptation result into the service system.
10. The warehouse logistics order identification system of claim 9, wherein the warehouse logistics order identification subsystem is further configured to re-input the order from the mail extraction order subsystem into the convolutional neural network to train the order-based classification model.
CN201911269946.7A 2019-12-11 2019-12-11 Warehouse logistics order identification method and system Active CN111047261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911269946.7A CN111047261B (en) 2019-12-11 2019-12-11 Warehouse logistics order identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911269946.7A CN111047261B (en) 2019-12-11 2019-12-11 Warehouse logistics order identification method and system

Publications (2)

Publication Number Publication Date
CN111047261A true CN111047261A (en) 2020-04-21
CN111047261B CN111047261B (en) 2023-06-16

Family

ID=70235798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911269946.7A Active CN111047261B (en) 2019-12-11 2019-12-11 Warehouse logistics order identification method and system

Country Status (1)

Country Link
CN (1) CN111047261B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680679A (en) * 2020-06-03 2020-09-18 重庆数道科技有限公司 Automatic document identification method based on OCR

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069452A (en) * 2015-08-07 2015-11-18 武汉理工大学 Straight line removing method based on local structure analysis
CN105631393A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Information recognition method and device
CA3017430A1 (en) * 2015-04-16 2016-10-20 Docauthority Ltd. Structural document classification
CN107633239A (en) * 2017-10-18 2018-01-26 江苏鸿信系统集成有限公司 Bill classification and bill field extracting method based on deep learning and OCR
CN107766809A (en) * 2017-10-09 2018-03-06 平安科技(深圳)有限公司 Electronic installation, billing information recognition methods and computer-readable recording medium
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN110008944A (en) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 OCR recognition methods and device, storage medium based on template matching
CN110427853A (en) * 2019-07-24 2019-11-08 北京一诺前景财税科技有限公司 A kind of method of smart tickets information extraction processing

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631393A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Information recognition method and device
CA3017430A1 (en) * 2015-04-16 2016-10-20 Docauthority Ltd. Structural document classification
CN105069452A (en) * 2015-08-07 2015-11-18 武汉理工大学 Straight line removing method based on local structure analysis
CN107766809A (en) * 2017-10-09 2018-03-06 平安科技(深圳)有限公司 Electronic installation, billing information recognition methods and computer-readable recording medium
WO2019071662A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic device, bill information identification method, and computer readable storage medium
CN107633239A (en) * 2017-10-18 2018-01-26 江苏鸿信系统集成有限公司 Bill classification and bill field extracting method based on deep learning and OCR
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN110008944A (en) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 OCR recognition methods and device, storage medium based on template matching
CN110427853A (en) * 2019-07-24 2019-11-08 北京一诺前景财税科技有限公司 A kind of method of smart tickets information extraction processing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680679A (en) * 2020-06-03 2020-09-18 重庆数道科技有限公司 Automatic document identification method based on OCR

Also Published As

Publication number Publication date
CN111047261B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN109887153B (en) Finance and tax processing method and system
US9552516B2 (en) Document information extraction using geometric models
US11348353B2 (en) Document spatial layout feature extraction to simplify template classification
EP2671190B1 (en) System for data extraction and processing
US20060282442A1 (en) Method of learning associations between documents and data sets
CN111191435B (en) Method and device for generating report form by dynamic template for customs report form
CN112418812A (en) Distributed full-link automatic intelligent clearance system, method and storage medium
KR101942468B1 (en) Structured data and unstructured data extraction system and method
CN112131348B (en) Method for preventing repeated declaration of project based on similarity of text and image
WO2022111247A1 (en) Report analysis method and apparatus
US20230205800A1 (en) System and method for detection and auto-validation of key data in any non-handwritten document
CN111047261A (en) Warehouse logistics order identification method and system
CN113841156A (en) Control method and device based on image recognition
US20210004578A1 (en) Standardized form recognition method, associated computer program product, processing and learning systems
CN107563689A (en) Use bar code management system and method
CN111428725A (en) Data structuring processing method and device and electronic equipment
CN111414917A (en) Identification method of low-pixel-density text
CN111178464A (en) Application of OCR recognition based on neural network in logistics industry express bill
TWM575887U (en) Intelligent accounting system
CN113553393A (en) Processing method and processing device for combining RPA and AI customs information
JP3872923B2 (en) Information processing mail sorting system
JP2004171316A (en) Ocr device, document retrieval system and document retrieval program
CN114169301A (en) Electronic surface list convergence number-taking method, device, equipment and storage medium
CN110956022A (en) Document processing method and system
CN112862409A (en) Picking bill verification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant