CN112417881A - Logistics information identification method and device, electronic equipment and storage medium - Google Patents

Logistics information identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112417881A
CN112417881A CN202011494422.0A CN202011494422A CN112417881A CN 112417881 A CN112417881 A CN 112417881A CN 202011494422 A CN202011494422 A CN 202011494422A CN 112417881 A CN112417881 A CN 112417881A
Authority
CN
China
Prior art keywords
user
address information
matched
delivery
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011494422.0A
Other languages
Chinese (zh)
Inventor
郁博文
任仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Manyun Logistics Information Co ltd
Original Assignee
Jiangsu Manyun Logistics Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Manyun Logistics Information Co ltd filed Critical Jiangsu Manyun Logistics Information Co ltd
Priority to CN202011494422.0A priority Critical patent/CN112417881A/en
Publication of CN112417881A publication Critical patent/CN112417881A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a logistics information identification method, a logistics information identification device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a text segment to be recognized; inputting the text segment into a trained logistics information recognition model; acquiring a plurality of address information and goods names output by the logistics information identification model; the plurality of address information are identified as a delivery location and a receiving location based on preset rules. The invention automatically identifies the delivery place, the receiving place and the goods name through a text segment, thereby reducing the operation time of a user and improving the release efficiency of the goods source.

Description

Logistics information identification method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a logistics information identification method and device, electronic equipment and a storage medium.
Background
With the development of the internet and information technology, online cargo platforms have become more and more popular between the shipper and the driver. The cargo owner publishes the cargo source information through the cargo platform, and a driver browses and accepts orders to realize the docking process before cargo transportation.
In the process of releasing the goods source, the shipper is required to fill out the names of the goods to be shipped, the consignee and the goods to be transported in detail. The user is complicated to manually fill in the detailed names of the delivery location, the receiving location and the goods to be transported. In order to save the user's filling-in time, the current solution is to obtain a text segment input by the user to directly recognize the text segment as address information.
In other words, the user needs to provide two text sections for the delivery location and the receiving location respectively, so that the corresponding address information can be identified respectively. At the same time, the user is required to additionally input the name of the goods. This still requires a large amount of time for the user.
Therefore, the technical problem to be solved by the technical personnel in the field is how to automatically identify the delivery location, the receiving location and the goods name through one text segment, thereby reducing the operation time of a user and improving the delivery efficiency of goods sources.
Disclosure of Invention
In order to overcome the defects of the related technologies, the invention provides a logistics information identification method, a logistics information identification device, an electronic device and a storage medium, and further automatically identifies a delivery place, a receiving place and a delivery name at least to a certain extent through a text segment, so that the operation time of a user is reduced, and the delivery efficiency of a goods source is improved.
According to an aspect of the present invention, there is provided a logistics information identification method, including:
acquiring a text segment to be recognized;
inputting the text segment into a trained logistics information recognition model;
acquiring a plurality of address information and goods names output by the logistics information identification model;
the plurality of address information are identified as a delivery location and a receiving location based on preset rules.
In some embodiments of the invention, the logistics information recognition model sequentially comprises an ALBERT layer, a BilSTM layer and a CRF layer, the ALBERT layer performs word segmentation on the text segment to be recognized at least based on an address data dictionary and a goods name dictionary to obtain character features and word segmentation features, and the character features and the word segmentation features are input into the BilSTM layer after feature embedding.
In some embodiments of the present invention, the logistics information identification model is constructed by the following steps:
pre-training the ALBERT layer through an SOP task;
and connecting the BilSTM layer and the CRF layer in series to form the logistics information identification model.
In some embodiments of the present invention, the identifying the plurality of address information as the delivery location and the receiving location based on the preset rule comprises:
acquiring a registration address of a user inputting a text segment to be identified, a historical delivery place and the number of the delivery places of the user, and a historical delivery place and the number of the delivery places of the user;
and identifying the address information as a delivery place and a receiving place according to the matching relation between the address information and the acquired registered address of the user, the historical delivery place of the user and the historical receiving place of the user.
In some embodiments of the present invention, the identifying the plurality of address information as the delivery location and the delivery location according to the matching relationship between the plurality of address information and the acquired registered address of the user, the historical delivery location of the user, and the historical delivery location of the user includes:
for address information to be matched in the plurality of address information:
judging whether the address information to be matched is only matched with the registered address of the user;
if yes, identifying the address information to be matched as a delivery place;
judging whether the address information to be matched is only matched with the historical delivery places of the users;
if yes, identifying the address information to be matched as a delivery place;
judging whether the address information to be matched is only matched with the historical receiving places of the users;
if yes, the address information to be matched is identified as a receiving place.
In some embodiments of the present invention, the identifying the plurality of address information as the delivery location and the delivery location according to the matching relationship between the plurality of address information and the acquired registered address of the user, the historical delivery location of the user, and the historical delivery location of the user further includes:
for address information to be matched in the plurality of address information:
judging whether the address information to be matched is only matched with the user historical delivery place and the user historical delivery place;
if so, identifying the address information to be matched as the number of the user historical delivery places and the number of the user historical receiving places which are more;
judging whether the address information to be matched is only matched with the registered address of the user and the historical delivery place of the user;
if yes, identifying the address information to be matched as a delivery place;
judging whether the address information to be matched is only matched with the registered address of the user and the historical receiving place of the user;
if so, identifying the address information to be matched as a receiving place when the number of the historical receiving places of the user is larger than a preset threshold value; otherwise, the address information to be matched is identified as a delivery place.
In some embodiments of the present invention, after the identifying the plurality of address information as the delivery location and the receiving location based on the preset rule, the method further includes:
the identified delivery location, receipt location, and shipment name are automatically filled into the corresponding fields.
According to still another aspect of the present invention, there is also provided a logistics information identification apparatus, comprising:
the acquisition module is configured to acquire a text segment to be recognized;
an input module configured to input the text segment into a trained logistics information recognition model;
the output module is configured to obtain a plurality of address information and goods names output by the logistics information identification model;
an identification module configured to identify the plurality of address information as a delivery location and a receipt location based on preset rules.
According to still another aspect of the present invention, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.
According to yet another aspect of the present invention, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.
Compared with the prior art, the invention has the advantages that:
according to the method, the text segment to be recognized is recognized through the logistics information recognition model to obtain the plurality of address information and the goods names, and the plurality of address information are further recognized as the delivery place and the receiving place through the preset rules, so that the efficient recognition of the addresses and the goods names is realized, the operation time of a user is shortened, and the delivery efficiency of the goods source is improved.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a flowchart of a logistics information identification method according to an embodiment of the invention.
Fig. 2 shows a schematic diagram of a logistics information identification model according to an embodiment of the invention.
Fig. 3 shows a flowchart for constructing a logistics information identification model according to an embodiment of the invention.
Fig. 4 shows a flowchart for identifying the plurality of address information as a delivery location and a receipt location based on preset rules according to an embodiment of the present invention.
Fig. 5 to 7 show application scenarios of logistics information identification according to an embodiment of the invention.
Fig. 8 is a block diagram illustrating a logistics information identification apparatus according to an embodiment of the present invention.
Fig. 9 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the invention.
Fig. 10 schematically illustrates an electronic device in an exemplary embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a flowchart of a logistics information identification method according to an embodiment of the invention. The logistics information identification method provided by the invention can be applied to a freight platform, a logistics platform or other platforms related to identification of delivery places, delivery places and freight names, and the invention is not limited by the method. The logistics information identification method comprises the following steps:
step S110: and acquiring a text segment to be recognized.
Specifically, the text segment to be recognized may be a text segment directly copied by the user, a text segment directly input by the user, or a text segment obtained by image-text recognition (obtained by recognition from an image).
Step S120: inputting the text segment into a trained logistics information recognition model.
The trained logistics information identification model will be described in conjunction with fig. 2 and fig. 3, and will not be described in detail here.
Step S130: and acquiring a plurality of address information and goods names output by the logistics information identification model.
Step S140: the plurality of address information are identified as a delivery location and a receiving location based on preset rules.
Specifically, the description of the preset rule will be expanded with reference to fig. 4, and will not be repeated herein.
According to the logistics information identification method provided by the invention, the text segment to be identified is used for identifying and obtaining the plurality of address information and the goods names through the logistics information identification model, and the plurality of address information are further identified as the delivery place and the receiving place through the preset rule, so that the efficient identification of the addresses and the goods names is realized, the operation time of a user is reduced, and the distribution efficiency of goods sources is improved.
Referring now to fig. 2 and 3, fig. 2 is a schematic diagram illustrating a logistics information identification model according to an embodiment of the present invention; fig. 3 shows a flowchart for constructing a logistics information identification model according to an embodiment of the invention.
The logistics information identification model used by the invention sequentially comprises an ALBERT layer 102, a BilSTM layer 104 and a CRF layer 105.
Specifically, the logistics information identification model is realized by adopting named entity identification. Named Entity Recognition (NER) is an information extraction method in Natural Language Processing (NLP). It locates entities in unstructured or semi-structured text. These entities can be anything from humans to humans, to specific, such as biomedical terminology. NER plays a very important role in enabling machines to understand text.
In general, BilSTM and CRF (model one) are commonly used for named entity recognition tasks; two models, BERT and CRF (model two). Wherein, BilSTM is a bidirectional Long Short-Term Memory model (direct Long Short-Term Memory); CRF is Conditional Random Field (Conditional Random Field); BERT is a model published by Google, and representative of Bidirectional encoder representations of Transformers, which is a conversion model that relies entirely on self-attention to compute its input and output representations without using a sequence-aligned circular neural network or convolution.
In contrast to model one, BERT, BilSTM, and CRF (model three) are used, which uses BERT to initialize embedding. Obviously, this initialization approach must be more reasonable than random initialization embedding. At present, the technology also proposes to use Word2vec to initialize embedding, but if a context-free static model such as Word2vec is used, the biggest disadvantage is that the processing effect on ambiguous words is poor. In the word2vec model, there is only one vector, regardless of context, but in BERT it can be a different vector.
Compared to model two, if we use BERT alone without BiLSTM, there is a problem: BERT uses Transformer and is based on self-attention, which weakens positional information, even with position embedding versus LSTM, sequence order dependent capture. For the sequence labeling task, the position information is very critical, and if the learning capability of the CRF model is directly used after BERT, the learning capability is reduced, so that the model III is better than the model II.
Although model three has greater advantages than model one and model two, the BERT model has a major bottleneck: the BERT model is too large, and the BERT-base also has a very large parameter number. In order to solve the problem of excessive sawing parameter, in the embodiment of the invention, ALBERT, BilSTM and CRF are adopted to replace the model III. ALBERT is an improvement over BERT models mainly using factorization and cross-layer parameter sharing to reduce the number of parameters. Further, the ALBERT may also use a new pre-training task presence-order prediction (SOP), thereby separating the inter-sentence sequential prediction task and the topic determination task. Therefore, according to the invention, ALBERT (as an improvement on BERT, the lightweight BERT) is adopted, compared with the BERT, the number of parameters is reduced, and meanwhile, the pre-training method of the BERT is adjusted, so that the overall performance of the original model is improved.
Specifically, when the logistics information identification model is constructed, the steps shown in fig. 3 are followed: step S106: pre-training the ALBERT layer through an SOP task; and step S107: and connecting the BilSTM layer and the CRF layer in series to form the logistics information identification model.
When the model is trained, on-line goods source information can be collected firstly, and manual data labeling of a delivery place, a receiving place and a goods name can be carried out by using special symbols (@, #, $) and the like. The annotations are then normalized and converted into standard data types (each character corresponds to a label) of the named entity recognition model. As shown in fig. 2 at reference numeral 101, the input of the ALBERT layer is a single character input starting with the CLS start symbol and ending with the SEP stop symbol. Then, only the processed source information may be divided into a test set, a training set, a validation set, and the like. Meanwhile, the ALBERT layer provided by the invention can also use Segmentation information. For example, different reference numerals are used to represent a single word; different reference numerals are used to identify the beginning, middle, end, etc. of a word. Further, when performing word segmentation at the ALBERT layer, at least an address data dictionary and a goods name dictionary (or a general dictionary) may be used to perform word segmentation on the text segment to be recognized to obtain character features and word segmentation features. Specifically, the address data dictionary is generated by collecting province, city, district, county, village and village data; the goods name dictionary is generated through common goods name data of the mobile phone. Further, the address data dictionary and the goods name dictionary can be added into a self-defined dictionary of the jieba word segmentation module, and therefore word segmentation feature accuracy for place names and goods names can be greatly improved.
In particular, the word segmentation feature will be used between the output of the ALBERT layer 102 and the input of the BILSTM layer 104 shown in FIG. 2 (i.e., in 103). The Character embedding (Character embedding) output by the conventional ALBERT layer 102 can be directly used as the input of the BILSTM layer 104, and the invention also adds a word segmentation embedding part to realize the Character embedding and the word segmentation embedding, thereby improving the accuracy of address recognition.
Therefore, when Character embedding is processed (Character embedding), the method adopts the ALBERT model to replace the random initialization or the word2vec part in the original network, and then trains downstream tasks and finely adjusts the ALBERT. Compared with BERT, ALBERT greatly improves the utilization rate of parameters by a matrix decomposition and parameter sharing method. In addition, ALBERT proposed an improvement to the pre-training strategy, from the original nsp (next sequence prediction) to sop (sequence Order prediction). The SOP and NSP are the same, and the goal is to determine whether the second of the two sentences entered is the join of the first. However, the negative example of SOP is to directly turn over two sentences of correct text, which is more complicated and difficult than NSP, and also can learn more semantic relations between sentences. After the model is constructed, the model can be trained through the sample, and the on-line fixed-point test can be carried out. Further, the model can be maintained by iterative training with manually corrected bad samples on line (badcase).
Referring now to fig. 4, fig. 4 illustrates a flowchart of identifying the plurality of address information as a delivery location and a receipt location based on preset rules, according to an embodiment of the present invention. Fig. 4 shows the following steps in total:
step S141: and acquiring the registered address of the user inputting the text segment to be recognized, the historical delivery places and the quantity of the historical delivery places of the user, and the historical delivery places and the quantity of the historical delivery places of the user.
Specifically, the registered address of the user may be an address obtained from a map interface when the user registers, or may be an address manually filled by the user, which is not limited in the present invention. The user historical shipping location is a location where the source information was historically published by the user (and/or the user historical orders). The user history receiving place is a receiving place of goods source information which is issued by the user history (and/or user history orders). The number of historical shipments and the number of historical consignments of the user is the number of times they appear in the source information that the user has historically published (and/or the user has historically ordered).
Step S142: and identifying the address information as a delivery place and a receiving place according to the matching relation between the address information and the acquired registered address of the user, the historical delivery place of the user and the historical receiving place of the user.
Specifically, step S142 may be implemented by:
and judging whether the address information to be matched is only matched with the registered address of the user or not for the address information to be matched in the plurality of address information. And if so, identifying the address information to be matched as a delivery place. Thus, when the user history shipping location and the user history receiving location are not matched, the address registered by the user is framed as a default shipping location.
And judging whether the address information to be matched is only matched with the historical delivery places of the users or not for the address information to be matched in the plurality of address information. And if so, identifying the address information to be matched as a delivery place. Thus, if the user history destination is matched with the user history destination, the user history destination can be directly identified as the destination.
And judging whether the address information to be matched is only matched with the historical receiving place of the user or not for the address information to be matched in the plurality of address information. If yes, the address information to be matched is identified as a receiving place. Thus, if the destination is matched with the user's historical destination only, the destination can be directly identified.
And judging whether the address information to be matched is only matched with the user historical delivery place and the user historical delivery place or not for the address information to be matched in the plurality of address information. If yes, identifying the address information to be matched as the number which is greater in the number of the user historical delivery places and the number of the user historical receiving places. Thus, when the user history shipping place and the user history receiving place are matched, the one having a larger number of occurrences/higher frequency of history can be used.
And judging whether the address information to be matched is only matched with the registered address of the user and the historical delivery place of the user or not for the address information to be matched in the plurality of address information. And if so, identifying the address information to be matched as a delivery place. In other words, the user registration address matches the matched destination, and therefore, the user registration address can be directly identified as the destination.
Judging whether the address information to be matched is only matched with the registered address of the user and the historical delivery place of the user or not for the address information to be matched in the plurality of address information; if so, identifying the address information to be matched as a receiving place when the number of the historical receiving places of the user is larger than a preset threshold value; otherwise, the address information to be matched is identified as a delivery place. Thus, since the user's registered address is used only for auxiliary reference (but is also used more as a shipping destination), the destination is identified as a shipping destination when the number of times the destination is used as a shipping destination is greater than a predetermined threshold (which may be set as desired, e.g., 2-10, although the invention is not so limited). Otherwise, it can be directly identified as the delivery location.
And if the address information to be matched in the plurality of address information is simultaneously matched with the registered address of the user, the historical delivery place of the user and the historical delivery place of the user, indicating that the delivery place and the delivery place are located at the same position. The detail degree of the address can be further judged, and if the delivery place and the receiving place are located in the same city, the address information to be identified can be used as the delivery place and the receiving place at the same time; if the delivery location and the receiving location are at the detailed address, a prompt (whether an error is entered) may be made.
Further, in another embodiment of the present invention, if any information is not matched, the determination may be made in other regular manners, such as according to the appearance sequence (for example, according to experience, the delivery location is before and the receiving location is after). For another example, the destination and the delivery destination may be determined based on a word (arrival ) between two pieces of address information.
The present invention can also be implemented in many different ways, which are not described herein.
Referring now to fig. 5-7, there are shown application scenarios for logistics information identification in accordance with one embodiment of the present invention. As shown in fig. 5-7, a shipping information input interface may be provided at the user end. The input interface can comprise a text segment input box, and the text segment input box can support the text segment pasted by the user and the text segment directly input by the user. The delivery information input interface also provides input boxes of fields such as a delivery place, a receiving place, a delivery name, a weight and the like, and the input boxes support direct input of a user and can be automatically filled in after being identified by a text end. As shown by reference numeral 200, a shipping information input interface when the user is not filled with information; as shown by reference numeral 201, a delivery information input interface for pasting a text segment on a text segment input box for a user; as shown by reference numeral 202, after the user pastes the text segment on the text segment input box and clicks the "recognition" control, the model recognition is triggered, so that after the fields of the delivery location, the receiving location, the name of the goods, the weight of the goods and the like are recognized from the text segment, the fields are automatically filled into the interface of the input box according to the field types.
Fig. 5-7 are merely schematic illustrations of implementations of the present invention, which is not intended to be limiting.
Specifically, the invention introduces the ALBERT which is a powerful and pre-trained open source model into downstream tasks of natural language recognition, and then uses the tasks to carry out fine adjustment. Compared with the conventional model I, the BERT model is equivalent to replace the original embedded part, and has a very good effect in practical application performance. Meanwhile, the address name and the goods dictionary which accord with the tasks of the user are added, so that the accuracy of word segmentation is utilized, the accuracy of word segmentation characteristics is improved, and the recognition effect of the model is improved. In the invention, ALBERT is used to replace BERT. ALBERT provides two methods of improving the utilization rate of parameters, namely matrix decomposition and parameter sharing, and can remarkably reduce the number of parameters without losing the performance of a model. The most critical promotion point of the invention is to use the company database to judge the two-section address of the loading and unloading place.
Thus, the ALBERT, BILSTM, and CRF models are employed to identify the drop and item names. The ALBERT is used for replacing the BERT, and the ALBERT has better effect than the BERT in industrial application as the lightweight BERT under the same training time. Among them, the present invention uses a segment embedding (segment embedding). The increase of the dimension of the word segmentation characteristics can make the improvement space of the model accuracy larger. When the multi-section address is identified, the address of the user when the user registers in the company database, the information of the user's prior loading and unloading place, the loading and unloading times and the like are used for assisting the judgment.
The foregoing is merely an exemplary description of various embodiments of the invention and is not intended to be limiting thereof. The above-described embodiments may be implemented individually or in combination, and such variations are within the scope of the invention.
According to still another aspect of the present invention, there is also provided a logistics information identification apparatus, and fig. 8 shows a block diagram of the logistics information identification apparatus according to an embodiment of the present invention. The logistics information identification device 300 includes an acquisition module 310, an input module 320, an output module 330, and an identification module 340.
The obtaining module 310 is configured to obtain a text segment to be recognized;
the input module 320 is configured to input the text segment into a trained logistics information recognition model;
the output module 330 is configured to obtain a plurality of address information and a cargo name output by the logistics information identification model;
the identification module 340 is configured to identify the plurality of address information as a delivery location and a receipt location based on preset rules.
In the logistics information identification device provided by the invention, the text segment to be identified is used for identifying and obtaining the plurality of address information and the goods names through the logistics information identification model, and the plurality of address information are further identified as the delivery place and the receiving place through the preset rule, so that the efficient identification of the addresses and the goods names is realized, the operation time of a user is reduced, and the distribution efficiency of goods sources is improved.
Fig. 8 is a schematic diagram of the logistics information identification apparatus 300 provided by the present invention, and the splitting, combining, and adding of modules are within the scope of the present invention without departing from the concept of the present invention. The logistics information identification apparatus 300 provided by the present invention can be implemented by software, hardware, firmware, plug-in and any combination thereof, which is not limited by the present invention.
In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by, for example, a processor, can implement the steps of the logistics information identification method described in any one of the above embodiments. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned logistics information identification method section of this specification, when the program product is run on the terminal device.
Referring to fig. 9, a program product 700 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the invention, there is also provided an electronic device that may include a processor and a memory for storing executable instructions of the processor. Wherein the processor is configured to execute the steps of the logistics information identification method of any one of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 10. The electronic device 500 shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 10, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 that couples various system components including the memory unit 520 and the processing unit 510, a display unit 540, and the like.
Wherein the storage unit stores program code executable by the processing unit 510 to cause the processing unit 510 to perform the steps according to various exemplary embodiments of the present invention described in the logistics information identification method section above in this specification. For example, the processing unit 510 may perform the steps shown in fig. 1, 3, and 4.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
The memory unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above-mentioned logistics information identification method according to the embodiment of the present invention.
Compared with the prior art, the invention has the advantages that:
according to the method, the text segment to be recognized is recognized through the logistics information recognition model to obtain the plurality of address information and the goods names, and the plurality of address information are further recognized as the delivery place and the receiving place through the preset rules, so that the efficient recognition of the addresses and the goods names is realized, the operation time of a user is shortened, and the delivery efficiency of the goods source is improved.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims (10)

1. A logistics information identification method is characterized by comprising the following steps:
acquiring a text segment to be recognized;
inputting the text segment into a trained logistics information recognition model;
acquiring a plurality of address information and goods names output by the logistics information identification model;
the plurality of address information are identified as a delivery location and a receiving location based on preset rules.
2. The logistics information identification method of claim 1, wherein the logistics information identification model comprises an ALBERT layer, a BilSTM layer and a CRF layer in sequence, wherein the ALBERT layer performs word segmentation on the text segment to be identified at least based on an address data dictionary and a good name dictionary to obtain character features and word segmentation features, and the character features and the word segmentation features are input into the BilSTM layer after feature embedding.
3. The logistics information identification method of claim 2, wherein the logistics information identification model is constructed by the following steps:
pre-training the ALBERT layer through an SOP task;
and connecting the BilSTM layer and the CRF layer in series to form the logistics information identification model.
4. The logistics information identification method of claim 1, wherein the identifying the plurality of address information as a delivery location and a receiving location based on preset rules comprises:
acquiring a registration address of a user inputting a text segment to be identified, a historical delivery place and the number of the delivery places of the user, and a historical delivery place and the number of the delivery places of the user;
and identifying the address information as a delivery place and a receiving place according to the matching relation between the address information and the acquired registered address of the user, the historical delivery place of the user and the historical receiving place of the user.
5. The logistics information identification method of claim 4, wherein the identifying the plurality of address information as the delivery location and the delivery location according to the matching relationship between the plurality of address information and the acquired registered address of the user, the historical delivery location of the user, and the historical delivery location of the user comprises:
for address information to be matched in the plurality of address information:
judging whether the address information to be matched is only matched with the registered address of the user;
if yes, identifying the address information to be matched as a delivery place;
judging whether the address information to be matched is only matched with the historical delivery places of the users;
if yes, identifying the address information to be matched as a delivery place;
judging whether the address information to be matched is only matched with the historical receiving places of the users;
if yes, the address information to be matched is identified as a receiving place.
6. The method for identifying logistics information according to claim 5, wherein the identifying the plurality of address information as the delivery location and the delivery location according to the matching relationship between the plurality of address information and the acquired registered address of the user, the historical delivery location of the user, and the historical delivery location of the user further comprises:
for address information to be matched in the plurality of address information:
judging whether the address information to be matched is only matched with the user historical delivery place and the user historical delivery place;
if so, identifying the address information to be matched as the number of the user historical delivery places and the number of the user historical receiving places which are more;
judging whether the address information to be matched is only matched with the registered address of the user and the historical delivery place of the user;
if yes, identifying the address information to be matched as a delivery place;
judging whether the address information to be matched is only matched with the registered address of the user and the historical receiving place of the user;
if so, identifying the address information to be matched as a receiving place when the number of the historical receiving places of the user is larger than a preset threshold value; otherwise, the address information to be matched is identified as a delivery place.
7. The logistics information identification method of any one of claims 1 to 6, wherein after identifying the plurality of address information as a delivery location and a receiving location based on a preset rule, further comprising:
the identified delivery location, receipt location, and shipment name are automatically filled into the corresponding fields.
8. A logistics information identification device, comprising:
the acquisition module is configured to acquire a text segment to be recognized;
an input module configured to input the text segment into a trained logistics information recognition model;
the output module is configured to obtain a plurality of address information and goods names output by the logistics information identification model;
an identification module configured to identify the plurality of address information as a delivery location and a receipt location based on preset rules.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory on which a computer program is stored, the computer program being executed by the processor to perform the logistics information identification method of any one of claims 1 to 7.
10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by the processor, executes the logistics information identification method of any one of claims 1 to 7.
CN202011494422.0A 2020-12-17 2020-12-17 Logistics information identification method and device, electronic equipment and storage medium Pending CN112417881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011494422.0A CN112417881A (en) 2020-12-17 2020-12-17 Logistics information identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011494422.0A CN112417881A (en) 2020-12-17 2020-12-17 Logistics information identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112417881A true CN112417881A (en) 2021-02-26

Family

ID=74776144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011494422.0A Pending CN112417881A (en) 2020-12-17 2020-12-17 Logistics information identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112417881A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076752A (en) * 2021-03-26 2021-07-06 中国联合网络通信集团有限公司 Method and device for identifying address
CN113255352A (en) * 2021-05-12 2021-08-13 北京易华录信息技术股份有限公司 Street information determination method and device and computer equipment
CN113255342A (en) * 2021-06-11 2021-08-13 云南大学 Method and system for identifying product name of 5G mobile service
CN113589993A (en) * 2021-07-16 2021-11-02 青岛海尔科技有限公司 Receiving address generation method and device, electronic equipment and storage medium
CN113673247A (en) * 2021-05-13 2021-11-19 江苏曼荼罗软件股份有限公司 Entity identification method, device, medium and electronic equipment based on deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392630A (en) * 2017-07-31 2017-11-24 柴玉发 Condition of merchandise retroactive method, apparatus and system for ecommerce
CN108876270A (en) * 2018-09-19 2018-11-23 惠龙易通国际物流股份有限公司 Automatic source of goods auditing system and method
CN109389982A (en) * 2018-12-26 2019-02-26 江苏满运软件科技有限公司 Shipping Information audio recognition method, system, equipment and storage medium
CN111428933A (en) * 2020-03-30 2020-07-17 江苏满运软件科技有限公司 Logistics address recommendation method, system, equipment and storage medium
CN111695355A (en) * 2020-05-26 2020-09-22 平安银行股份有限公司 Address text recognition method, device, medium and electronic equipment
CN111709241A (en) * 2020-05-27 2020-09-25 西安交通大学 Named entity identification method oriented to network security field
CN111814058A (en) * 2020-08-20 2020-10-23 深圳市欢太科技有限公司 Pushing method and device based on user intention, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392630A (en) * 2017-07-31 2017-11-24 柴玉发 Condition of merchandise retroactive method, apparatus and system for ecommerce
CN108876270A (en) * 2018-09-19 2018-11-23 惠龙易通国际物流股份有限公司 Automatic source of goods auditing system and method
CN109389982A (en) * 2018-12-26 2019-02-26 江苏满运软件科技有限公司 Shipping Information audio recognition method, system, equipment and storage medium
CN111428933A (en) * 2020-03-30 2020-07-17 江苏满运软件科技有限公司 Logistics address recommendation method, system, equipment and storage medium
CN111695355A (en) * 2020-05-26 2020-09-22 平安银行股份有限公司 Address text recognition method, device, medium and electronic equipment
CN111709241A (en) * 2020-05-27 2020-09-25 西安交通大学 Named entity identification method oriented to network security field
CN111814058A (en) * 2020-08-20 2020-10-23 深圳市欢太科技有限公司 Pushing method and device based on user intention, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076752A (en) * 2021-03-26 2021-07-06 中国联合网络通信集团有限公司 Method and device for identifying address
CN113255352A (en) * 2021-05-12 2021-08-13 北京易华录信息技术股份有限公司 Street information determination method and device and computer equipment
CN113673247A (en) * 2021-05-13 2021-11-19 江苏曼荼罗软件股份有限公司 Entity identification method, device, medium and electronic equipment based on deep learning
CN113255342A (en) * 2021-06-11 2021-08-13 云南大学 Method and system for identifying product name of 5G mobile service
CN113589993A (en) * 2021-07-16 2021-11-02 青岛海尔科技有限公司 Receiving address generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112417881A (en) Logistics information identification method and device, electronic equipment and storage medium
CN107679234B (en) Customer service information providing method, customer service information providing device, electronic equipment and storage medium
CN107908635B (en) Method and device for establishing text classification model and text classification
CN109284399B (en) Similarity prediction model training method and device and computer readable storage medium
CN111309915A (en) Method, system, device and storage medium for training natural language of joint learning
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN107844560A (en) A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
EP3591539A1 (en) Parsing unstructured information for conversion into structured data
CN110427487B (en) Data labeling method and device and storage medium
CN108305050B (en) Method, device, equipment and medium for extracting report information and service demand information
CN113064964A (en) Text classification method, model training method, device, equipment and storage medium
CN110222330B (en) Semantic recognition method and device, storage medium and computer equipment
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN112507695A (en) Text error correction model establishing method, device, medium and electronic equipment
CN113434683A (en) Text classification method, device, medium and electronic equipment
CN110717333B (en) Automatic generation method and device for article abstract and computer readable storage medium
CN113486178B (en) Text recognition model training method, text recognition method, device and medium
CN114385694A (en) Data processing method and device, computer equipment and storage medium
CN110598989B (en) Goods source quality evaluation method, device, equipment and storage medium
CN115455922B (en) Form verification method, form verification device, electronic equipment and storage medium
CN112989050B (en) Form classification method, device, equipment and storage medium
CN115017385A (en) Article searching method, device, equipment and storage medium
CN111339760A (en) Method and device for training lexical analysis model, electronic equipment and storage medium
CN113627197B (en) Text intention recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210012 3rd floor, building a, Wanbo Science Park, 66 Huashen Avenue, Yuhuatai District, Nanjing City, Jiangsu Province

Applicant after: Jiangsu Yunmanman Information Technology Co.,Ltd.

Address before: 210012 4th floor, building 5, no.170-1, software Avenue, Yuhuatai District, Nanjing City, Jiangsu Province

Applicant before: Jiangsu manyun Logistics Information Co.,Ltd.