CN114169966B - Method and system for extracting unit data of goods by tensor - Google Patents

Method and system for extracting unit data of goods by tensor Download PDF

Info

Publication number
CN114169966B
CN114169966B CN202111493743.3A CN202111493743A CN114169966B CN 114169966 B CN114169966 B CN 114169966B CN 202111493743 A CN202111493743 A CN 202111493743A CN 114169966 B CN114169966 B CN 114169966B
Authority
CN
China
Prior art keywords
goods
tensor
name
dimension
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111493743.3A
Other languages
Chinese (zh)
Other versions
CN114169966A (en
Inventor
张勇
李森
黄思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hainan Port And Shipping Holding Co ltd
Original Assignee
Hainan Port And Shipping Holding Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainan Port And Shipping Holding Co ltd filed Critical Hainan Port And Shipping Holding Co ltd
Priority to CN202111493743.3A priority Critical patent/CN114169966B/en
Publication of CN114169966A publication Critical patent/CN114169966A/en
Application granted granted Critical
Publication of CN114169966B publication Critical patent/CN114169966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a system for extracting unit data of goods by tensor, wherein the method for extracting the unit data of the goods comprises the following steps: a short sentence extraction step and a metadata extraction step; the method uses the tensor to store the goods entity, so that all attribute values of the goods entity can be stored completely, and therefore when the metadata is extracted, more attribute values participate in matching operation, and the extracted metadata is more comprehensive and higher in accuracy; according to the method, the tensor is constructed through the deep neural network pre-training model and the named entity recognition model, so that the finally obtained goods tensor has more dimensionalities and more stored attribute values; the metadata extraction method and the metadata extraction device perform metadata extraction operation through a natural language understanding technology, so that the extracted metadata is more comprehensive and higher in accuracy.

Description

Method and system for extracting unit data of goods by tensor
Technical Field
The invention belongs to the technical field of information, and particularly relates to a method and a system for extracting unit data of goods by tensor.
Background
Metadata, also called intermediary data, is data describing data, the content of which is descriptive information for data and information resources; in an enterprise, there is metadata corresponding to data wherever it exists. Only if complete and accurate metadata exists, the data can be better understood, and the value of the data can be fully mined.
In the background of the big data era, data, namely assets and metadata realize the description and classification formatting of information, so that the possibility is created for machine processing, and the method can help enterprises to better manage data assets and clear the relationship among data. Metadata has two uses in the traditional sense:
first, it helps the data platform to know its own situation. For example: the information of which data are available, how large the stored data are, how to find the required data, when the data are generated and the like can be taken, and the work of corresponding operation and maintenance alarm and the like can be performed.
Second, help the data platform formulate the standard of the data statistics. For example: the data quality and the visualization maintenance can be laid a foundation by getting through the incidence relation between the upstream and downstream data and the information of how unified the data aperture, how unified the calculation index, the relation between the data, what the upstream and downstream incidence data of the data are and the like.
The cargo entity is different from other entities in that it has a very large number of attributes including, but not limited to, cargo name, size, storage method, transportation requirements, color, place of origin, etc.; the existing goods order metadata extraction method usually uses an array to store the attribute values of goods entities, but the array cannot store all the attribute values of the goods entities, so that the effect of goods order unit data extraction is reduced, and the extracted metadata is not comprehensive and accurate enough.
To facilitate understanding of the invention, the following explanations are made with respect to terms and related concepts:
composition of metadata: one metadata is composed of metadata items and metadata contents, for example: "author" and "date" are metadata items, and "Chinese publisher" and "12/7/2021" are metadata contents;
deep neural network pre-training model: pre-training a directly used model, determining the structure of the model, and initializing the parameters of the model; a common deep neural network pre-training model is a bert model;
tensor: an array of multiple dimensions, each dimension storing values of the same data type; each tensor has a name, called tensor name; each dimension has a name, called dimension name;
named entity recognition model: the method comprises the steps of identifying entities or attributes in the text to be processed; the commonly used named entity recognition model is BilSTM-CRF;
entity: from a data processing perspective, an objective thing in the real world is called an entity, which is any distinguishable and identifiable thing in the real world; the entity in the invention refers to goods, and the entity name is a goods name;
the attributes are as follows: a characteristic of the entity; each entity has a plurality of attributes;
attribute values: materialized representation of attributes, an entity can be represented by a collection of its attribute values;
natural Language understanding (NLP) technology: a technology in the field of artificial intelligence that enables computers to understand and utilize the natural language of human society to achieve natural language communication between humans and computers; commonly used NLP algorithms include: the method comprises the following steps of performing maximum matching word segmentation algorithm, performing shortest path word segmentation algorithm, performing n-gram model-based word segmentation algorithm, and performing word-based word segmentation algorithm;
similar phrases: through natural language understanding technology, short sentences which are similar to the meaning of the initial short sentences are found, namely the similar short sentences of the short sentences; for example: the initial phrases are: red; an oval shape; freezing; the wax gourd is green and oval; the corresponding similar phrases are: red and black; an oval shape; 0-2 degrees; white gourd is light green, oval;
text field: the area where only one line of characters can be input;
the cargo type: the goods are classified according to the specific names of the goods, so that different goods types are obtained, for example: apple, orange, chocolate.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method and a system for extracting unit data of goods by tensor, and aims to solve the problems of improving the comprehensiveness and accuracy of metadata extraction.
To achieve the above object, the present invention provides a method for storing information on goods by using tensor, comprising the steps of:
(1) tensor generation step:
obtaining a tensor from a plurality of sample order texts of the same cargo type through a deep neural network pre-training model according to a preset dimension name set;
the sample order texts are called as a text set; the goods corresponding to the text set are called target goods; the dimension name set consists of dimension names, and the dimension names are attributes of the target goods;
each dimension name of the tensor corresponds to each dimension name in the dimension name set in a one-to-one mode; storing an attribute value corresponding to an attribute for each dimension, the corresponding attribute value being derived from the text set;
(2) a dimension adding step:
in the text set, searching and obtaining the attribute which is not contained in the tensor through a named entity recognition model; and taking each attribute as a dimension name, and adding a corresponding dimension in the tensor to obtain the goods tensor.
Preferably, the tensor generation step comprises the sub-steps of:
(1-1) model input substep:
inputting the text set into the deep neural network pre-training model; each sample order text in the text set comprises a text field named as a goods name, and the text field is used for storing the goods name of the order text;
(1-2) tensor construction substep:
the deep neural network pre-training model acquires the content of the goods name from the text set and takes the content as a tensor name; according to each dimension name in the dimension name set, searching an attribute value corresponding to the dimension name in the text set; and constructing a tensor according to the tensor name, each dimension name in the dimension name set and the corresponding attribute value thereof.
The invention provides a method for extracting data of a goods binding unit based on the goods tensor, which is characterized by comprising the following steps:
(1) and (3) short sentence extraction:
extracting similar short sentences of the short sentences in the initial short sentence set from a single goods order text according to a preset initial short sentence set consisting of short sentences through a natural language understanding technology, and adding the similar short sentences into the initial short sentence set to obtain a final short sentence set; the short sentence is a descriptive phrase or sentence of the goods corresponding to the single goods order text;
the goods corresponding to the single goods order text and the goods corresponding to the goods tensor are of the same goods type;
(2) metadata extraction:
extracting attribute values in metadata contents from the final short sentence set through a natural language understanding technology according to preset metadata items and the goods tensor; the metadata items comprise goods, attributes; the corresponding metadata content comprises the goods name and the attribute value.
Preferably, the process of extracting metadata content in the metadata extraction step is as follows: matching each attribute value in the goods tensor with each short sentence in the final short sentence set through a natural language understanding technology; if the short sentence contains the attribute value or the similar short sentence of the attribute value, the matching is successful, and the short sentence which is successfully matched is extracted as the attribute value in the metadata content; otherwise, the matching fails.
The invention provides a system for storing goods information by tensor, which is characterized by comprising the following modules:
a tensor generation module:
the system comprises a model acquisition module, a model generation module, a dimension name set generation module and a dimension name set generation module, wherein the model acquisition module is used for acquiring a tensor from a plurality of sample order texts of the same cargo type through a deep neural network pre-training model according to a preset dimension name set;
the sample order texts are called as a text set; the goods corresponding to the text set are called target goods; the dimension name set consists of dimension names, and the dimension names are attributes of the target goods;
each dimension name of the tensor corresponds to each dimension name in the dimension name set in a one-to-one mode; storing an attribute value corresponding to an attribute for each dimension, the corresponding attribute value being derived from the text set;
a dimension adding module:
the attribute searching module is used for searching and obtaining the attribute which is not contained in the tensor through a named entity recognition model in the text set; and taking each attribute as a dimension name, and adding a corresponding dimension in the tensor to obtain the goods tensor.
Preferably, the tensor generation module comprises the following sub-modules:
a model input submodule:
the text set is input into the deep neural network pre-training model; each sample order text in the text set comprises a text field named as a goods name, and the text field is used for storing the goods name of the order text;
tensor construction submodule:
the deep neural network pre-training model is used for acquiring the content of the goods name from the text set and taking the content as a tensor name; according to each dimension name in the dimension name set, searching an attribute value corresponding to the dimension name in the text set; and constructing a tensor according to the tensor name, each dimension name in the dimension name set and the corresponding attribute value thereof.
The invention provides a system for extracting metadata of a goods order based on the goods tensor, which is characterized by comprising the following modules:
a short sentence extraction module:
the method comprises the steps that according to a preset initial short sentence set composed of short sentences, similar short sentences of the short sentences in the initial short sentence set are extracted from a single goods order text through a natural language understanding technology, and the similar short sentences are added into the initial short sentence set to obtain a final short sentence set; the short sentence is a descriptive phrase or sentence of the goods corresponding to the single goods order text;
the goods corresponding to the single goods order text and the goods corresponding to the goods tensor are of the same goods type;
a metadata extraction module:
the attribute values in the metadata content are extracted from the final phrase set through a natural language understanding technology according to preset metadata items and the goods tensor; the metadata items comprise goods, attributes; the corresponding metadata content comprises the goods name and the attribute value.
Preferably, the operation of extracting metadata content in the metadata extraction module is: matching each attribute value in the goods tensor with each short sentence in the final short sentence set through a natural language understanding technology; if the short sentence contains the attribute value or the similar short sentence of the attribute value, the matching is successful, and the short sentence which is successfully matched is extracted as the attribute value in the metadata content; otherwise, the matching fails.
The invention provides a device for extracting metadata of goods orders by tensor, which is characterized by comprising a memory and a processor, wherein the memory is used for storing the metadata of goods orders; the memory for storing a computer program; the processor, when executing the computer program, is adapted to implement the method of extracting goods order metadata as described above.
The present invention provides a computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of extracting goods order metadata as described above.
Compared with the prior art, the goods entity is stored by using tensor, so that all attribute values of the goods entity can be stored completely, more attribute values participate in matching operation during metadata extraction, and the extracted metadata is more comprehensive and higher in accuracy;
according to the method, the tensor is constructed through the deep neural network pre-training model and the named entity recognition model, so that the finally obtained goods tensor has more dimensionality and more stored attribute values;
the metadata extraction method and the metadata extraction device perform metadata extraction operation through a natural language understanding technology, so that the extracted metadata is more comprehensive and higher in accuracy.
Drawings
Fig. 1 is a flowchart of a method for storing information about goods by tensor according to an embodiment of the present invention;
fig. 2 is a flowchart of a tensor generation step in a method for storing information about goods by using tensors according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for extracting data of unit of orders using tensors according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment provides a method for storing goods information by tensor, which comprises the following steps:
(1) a tensor generation step, comprising the sub-steps of:
(1-1) model input substep:
inputting a plurality of sample order texts of the same cargo type into a deep neural network pre-training model; wherein the type of cargo is chocolate;
a plurality of sample order texts are called as text sets; the goods corresponding to the text set are called target goods; each sample order text in the text set comprises a text field named as a goods name, and the text field is used for storing the goods name of the order text; wherein the goods name is chocolate.
(1-2) tensor construction substep:
the deep neural network pre-training model obtains the content of the goods name from the text set and takes the content as a tensor name; searching an attribute value corresponding to the dimension name in the text set according to each dimension name in a preset dimension name set; and constructing a tensor according to the tensor name, each dimension name in the dimension name set and the corresponding attribute value.
The dimension name set consists of dimension names, and the dimension names are attributes of the target goods; each dimension name of the tensor corresponds to each dimension name in the dimension name set in a one-to-one mode; storing an attribute value corresponding to the attribute in each dimension, wherein the corresponding attribute value is derived from a text set;
wherein, the dimension names in the dimension name set are: color, shape, storage location.
(2) A dimension adding step:
in the text set, searching and obtaining attributes which are not contained in the tensor through a named entity recognition model; and taking each attribute as a dimension name, and adding a corresponding dimension in the tensor to obtain the goods tensor.
Wherein, the searching process is as follows: the named entity recognition model finds an attribute value of "-18 degrees" in the text set, judges that the attribute corresponding to the attribute is temperature, and then takes the temperature as a dimension name and adds a corresponding dimension in the tensor.
In the embodiment, the tensor is constructed through the deep neural network pre-training model and the named entity recognition model, so that the finally obtained goods tensor has more dimensionality and more stored attribute values;
the embodiment provides a method for extracting metadata of an order of goods by tensor, which comprises the following steps:
(1) and (3) short sentence extraction:
extracting similar short sentences of the short sentences in the initial short sentence set from a single goods order text according to a preset initial short sentence set consisting of the short sentences through a natural language understanding technology, and adding the similar short sentences into the initial short sentence set to obtain a final short sentence set; the short sentence is a descriptive phrase or sentence of the goods corresponding to the single goods order text;
the goods corresponding to the single goods order text and the goods corresponding to the goods tensor are of the same goods type;
wherein, the phrases in the initial phrase set are: black; an oval shape; placing in a box; freezing;
the corresponding similar phrases are: dark black; an oval shape; is arranged in a box; 0-2 degrees.
(2) Metadata extraction:
extracting attribute values in metadata contents from the final short sentence set through a natural language understanding technology according to preset metadata items and goods tensors; metadata items include goods, properties; the corresponding metadata content comprises the goods name and the attribute value.
The process of extracting the metadata content is as follows: matching each attribute value in the goods tensor with each short sentence in the final short sentence set through a natural language understanding technology; if the short sentence contains the attribute value or the similar short sentence of the attribute value, the matching is successful, and the short sentence which is successfully matched is extracted as the attribute value in the metadata content; otherwise, the matching fails.
Wherein the metadata items are (goods, color); the extracted metadata contents are as follows: (chocolate, black).
In the embodiment, the goods entity is stored by using the tensor, so that all attribute values of the goods entity can be stored completely, and therefore, more attribute values participate in matching operation during metadata extraction, so that the extracted metadata is more comprehensive and has higher accuracy;
according to the metadata extraction method and device, metadata extraction operation is performed through a natural language understanding technology, so that extracted metadata is more comprehensive and higher in accuracy.
The embodiment provides a system for storing goods information by tensor, which comprises the following modules:
a tensor generation module:
the system comprises a model acquisition module, a model generation module, a dimension name set generation module and a dimension name set generation module, wherein the model acquisition module is used for acquiring a tensor from a plurality of sample order texts of the same cargo type through a deep neural network pre-training model according to a preset dimension name set;
a plurality of sample order texts are called as text sets; the goods corresponding to the text set are called target goods; the dimension name set consists of dimension names, and the dimension names are attributes of the target goods;
each dimension name of the tensor corresponds to each dimension name in the dimension name set in a one-to-one mode; storing an attribute value corresponding to the attribute in each dimension, wherein the corresponding attribute value is derived from a text set;
the tensor generation module includes the following sub-modules:
a model input submodule:
the pre-training model is used for inputting the text set into the deep neural network pre-training model; each sample order text in the text set comprises a text field named as a goods name, and the text field is used for storing the goods name of the order text;
tensor construction submodule:
the method comprises the steps that a deep neural network pre-training model is used for obtaining the content of goods names from a text set and taking the content as tensor names; searching an attribute value corresponding to each dimension name in the dimension name set in the text set according to each dimension name in the dimension name set; and constructing a tensor according to the tensor name, each dimension name in the dimension name set and the corresponding attribute value.
A dimension adding module:
the method comprises the steps that in a text set, attributes which are not in a tensor are obtained through a named entity recognition model in a searching mode; and taking each attribute as a dimension name, and adding a corresponding dimension in the tensor to obtain the goods tensor.
The embodiment provides a system for extracting metadata of goods orders by tensor, which comprises the following modules:
a short sentence extraction module:
the method comprises the steps of extracting similar short sentences of the short sentences in an initial short sentence set from a single goods order text according to the preset initial short sentence set consisting of short sentences through a natural language understanding technology, and adding the similar short sentences into the initial short sentence set to obtain a final short sentence set; the short sentence is a descriptive phrase or sentence of the goods corresponding to the single goods order text;
the goods corresponding to the single goods order text and the goods corresponding to the goods tensor are of the same goods type;
a metadata extraction module:
the method comprises the steps of extracting attribute values in metadata contents from a final short sentence set through a natural language understanding technology according to preset metadata items and goods tensors; metadata items include goods, properties; the corresponding metadata content comprises a goods name and an attribute value;
wherein, the operation of extracting the metadata content is as follows: matching each attribute value in the goods tensor and each short sentence in the final short sentence set by a natural language understanding technology; if the short sentence contains the attribute value or the similar short sentence of the attribute value, the matching is successful, and the short sentence which is successfully matched is extracted as the attribute value in the metadata content; otherwise, the matching fails.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for extracting metadata of an order for goods by tensor, comprising the steps of:
(1) and (3) short sentence extraction:
extracting similar short sentences of the short sentences in the initial short sentence set from a single goods order text according to a preset initial short sentence set consisting of short sentences through a natural language understanding technology, and adding the similar short sentences into the initial short sentence set to obtain a final short sentence set; the short sentence is a descriptive phrase or sentence of the goods corresponding to the single goods order text;
the goods corresponding to the single goods order text and the goods corresponding to the goods tensor are of the same goods type;
(2) metadata extraction:
extracting attribute values in metadata contents from the final short sentence set through a natural language understanding technology according to preset metadata items and the goods tensor; the metadata items comprise goods, attributes; the corresponding metadata content comprises a goods name and an attribute value;
the method for acquiring the tensor of the goods comprises the following steps:
s1: tensor generation step:
obtaining a tensor from a plurality of sample order texts of the same cargo type through a deep neural network pre-training model according to a preset dimension name set;
the sample order texts are called as a text set; the goods corresponding to the text set are called target goods; the dimension name set consists of dimension names, and the dimension names are attributes of the target goods;
each dimension name of the tensor corresponds to each dimension name in the dimension name set in a one-to-one mode; storing an attribute value corresponding to an attribute for each dimension, the corresponding attribute value being derived from the text set;
s2: a dimension adding step:
in the text set, searching and obtaining the attribute which is not contained in the tensor through a named entity recognition model; and taking each attribute as a dimension name, and adding a corresponding dimension in the tensor to obtain the goods tensor.
2. The method for extracting data of a unit ordered by goods using a tensor as set forth in claim 1, wherein the tensor generation step includes the substeps of:
s1.1: model input substep:
inputting the text set into the deep neural network pre-training model; each sample order text in the text set comprises a text field named as a goods name, and the text field is used for storing the goods name of the order text;
s1.2: tensor construction substep:
the deep neural network pre-training model acquires the content of the goods name from the text set and takes the content as a tensor name; according to each dimension name in the dimension name set, searching an attribute value corresponding to the dimension name in the text set; and constructing a tensor according to the tensor name, each dimension name in the dimension name set and the corresponding attribute value thereof.
3. The method for extracting data of units ordered by means of tensor as claimed in claim 1, wherein the process of extracting metadata content in the metadata extraction step is as follows: matching each attribute value in the goods tensor with each short sentence in the final short sentence set through a natural language understanding technology; if the short sentence contains the attribute value or the similar short sentence of the attribute value, the matching is successful, and the short sentence which is successfully matched is extracted as the attribute value in the metadata content; otherwise, the matching fails.
4. A system for extracting metadata of an order for goods using tensors, comprising:
a short sentence extraction module:
the method comprises the steps that according to a preset initial short sentence set composed of short sentences, similar short sentences of the short sentences in the initial short sentence set are extracted from a single goods order text through a natural language understanding technology, and the similar short sentences are added into the initial short sentence set to obtain a final short sentence set; the short sentence is a descriptive phrase or sentence of the goods corresponding to the single goods order text;
the goods corresponding to the single goods order text and the goods corresponding to the goods tensor are of the same goods type;
a metadata extraction module:
the attribute values in the metadata content are extracted from the final phrase set through a natural language understanding technology according to preset metadata items and the goods tensor; the metadata items comprise goods, attributes; the corresponding metadata content comprises a goods name and an attribute value;
the goods tensor is acquired by a tensor generation module and a dimension addition module;
the tensor generation module is used for obtaining a tensor from a plurality of sample order texts of the same cargo type through a deep neural network pre-training model according to a preset dimension name set; the sample order texts are called as a text set; the goods corresponding to the text set are called target goods; the dimension name set consists of dimension names, and the dimension names are attributes of the target goods; each dimension name of the tensor corresponds to each dimension name in the dimension name set in a one-to-one mode; storing an attribute value corresponding to an attribute for each dimension, the corresponding attribute value being derived from the text set;
the dimensionality adding module is used for searching and obtaining the attributes which are not contained in the tensor through a named entity recognition model in the text set; and taking each attribute as a dimension name, and adding a corresponding dimension in the tensor to obtain the goods tensor.
5. The system for tensor extraction of metadata for orders for goods as recited in claim 4, wherein the tensor generation module includes the following sub-modules:
a model input submodule:
the text set is input into the deep neural network pre-training model; each sample order text in the text set comprises a text field named as a goods name, and the text field is used for storing the goods name of the order text;
a tensor construction submodule:
the deep neural network pre-training model is used for acquiring the content of the goods name from the text set and taking the content as a tensor name; according to each dimension name in the dimension name set, searching an attribute value corresponding to the dimension name in the text set; and constructing a tensor according to the tensor name, each dimension name in the dimension name set and the corresponding attribute value thereof.
6. The system for extracting metadata of goods orders using tensors according to claim 4, wherein the metadata extraction module extracts metadata contents by: matching each attribute value in the goods tensor with each short sentence in the final short sentence set through a natural language understanding technology; if the short sentence contains the attribute value or the similar short sentence of the attribute value, the matching is successful, and the short sentence which is successfully matched is extracted as the attribute value in the metadata content; otherwise, the matching fails.
7. An apparatus for extracting metadata for an order for goods using tensors, comprising a memory and a processor; the memory for storing a computer program; the processor, when executing the computer program, is configured to implement a system for extracting item order metadata with tensors as claimed in any of claims 4 to 6.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, implements a system for extracting item order metadata with tensors according to any one of claims 4 to 6.
CN202111493743.3A 2021-12-08 2021-12-08 Method and system for extracting unit data of goods by tensor Active CN114169966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111493743.3A CN114169966B (en) 2021-12-08 2021-12-08 Method and system for extracting unit data of goods by tensor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111493743.3A CN114169966B (en) 2021-12-08 2021-12-08 Method and system for extracting unit data of goods by tensor

Publications (2)

Publication Number Publication Date
CN114169966A CN114169966A (en) 2022-03-11
CN114169966B true CN114169966B (en) 2022-08-05

Family

ID=80484468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111493743.3A Active CN114169966B (en) 2021-12-08 2021-12-08 Method and system for extracting unit data of goods by tensor

Country Status (1)

Country Link
CN (1) CN114169966B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8301514B1 (en) * 2010-09-14 2012-10-30 Amazon Technologies, Inc. System, method, and computer readable medium for providing recommendations based on purchase phrases
CN107978373A (en) * 2017-11-23 2018-05-01 吉林大学 A kind of semi-supervised biomedical event extraction method based on common training
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN110609998A (en) * 2019-08-07 2019-12-24 中通服建设有限公司 Data extraction method of electronic document information, electronic equipment and storage medium
US10565498B1 (en) * 2017-02-28 2020-02-18 Amazon Technologies, Inc. Deep neural network-based relationship analysis with multi-feature token model
US10607042B1 (en) * 2019-02-12 2020-03-31 Live Objects, Inc. Dynamically trained models of named entity recognition over unstructured data
CN112749562A (en) * 2020-12-31 2021-05-04 合肥工业大学 Named entity identification method, device, storage medium and electronic equipment
CN113076718A (en) * 2021-04-09 2021-07-06 苏州爱语认知智能科技有限公司 Commodity attribute extraction method and system
CN113724055A (en) * 2021-09-14 2021-11-30 京东科技信息技术有限公司 Commodity attribute mining method and device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101042515B1 (en) * 2008-12-11 2011-06-17 주식회사 네오패드 Method for searching information based on user's intention and method for providing information
US11514096B2 (en) * 2015-09-01 2022-11-29 Panjiva, Inc. Natural language processing for entity resolution
US10380259B2 (en) * 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
US10910105B2 (en) * 2017-05-31 2021-02-02 International Business Machines Corporation Monitoring the use of language of a patient for identifying potential speech and related neurological disorders
JP7063080B2 (en) * 2018-04-20 2022-05-09 富士通株式会社 Machine learning programs, machine learning methods and machine learning equipment
CN110222200A (en) * 2019-06-20 2019-09-10 京东方科技集团股份有限公司 Method and apparatus for entity fusion
CN110490443A (en) * 2019-08-11 2019-11-22 安徽神海港航数据服务有限公司 The monitoring of shipping dynamic transport power and concocting method
CN111260437B (en) * 2020-01-14 2023-07-11 北京邮电大学 Product recommendation method based on commodity-aspect-level emotion mining and fuzzy decision
WO2021150676A1 (en) * 2020-01-21 2021-07-29 Ancestry.Com Operations Inc. Joint extraction of named entities and relations from text using machine learning models
KR20210142891A (en) * 2020-05-19 2021-11-26 삼성에스디에스 주식회사 Method and apparatus for customizing natural language processing model
US11436851B2 (en) * 2020-05-22 2022-09-06 Bill.Com, Llc Text recognition for a neural network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8301514B1 (en) * 2010-09-14 2012-10-30 Amazon Technologies, Inc. System, method, and computer readable medium for providing recommendations based on purchase phrases
US10565498B1 (en) * 2017-02-28 2020-02-18 Amazon Technologies, Inc. Deep neural network-based relationship analysis with multi-feature token model
CN107978373A (en) * 2017-11-23 2018-05-01 吉林大学 A kind of semi-supervised biomedical event extraction method based on common training
US10607042B1 (en) * 2019-02-12 2020-03-31 Live Objects, Inc. Dynamically trained models of named entity recognition over unstructured data
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN110609998A (en) * 2019-08-07 2019-12-24 中通服建设有限公司 Data extraction method of electronic document information, electronic equipment and storage medium
CN112749562A (en) * 2020-12-31 2021-05-04 合肥工业大学 Named entity identification method, device, storage medium and electronic equipment
CN113076718A (en) * 2021-04-09 2021-07-06 苏州爱语认知智能科技有限公司 Commodity attribute extraction method and system
CN113724055A (en) * 2021-09-14 2021-11-30 京东科技信息技术有限公司 Commodity attribute mining method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Sentence Detection and Extraction in machine printed imaged document using matching technique;Shalini Puri等;《2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS)》;20151222;第6页 *
Transformer-Based Neural Network for Answer Selection in Question Answering;Taihua Shao;《IEEE Access》;20190221;第7卷;第26146 - 26156页 *
融合功能信息的相似专利查找方法研究;王宁宁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200715;第I138-1549页 *
面向深度学习网络的细粒度商品评价分析;康月等;《计算机工程与应用》;20210114;第57卷(第11期);第140-147页 *

Also Published As

Publication number Publication date
CN114169966A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN110502621B (en) Question answering method, question answering device, computer equipment and storage medium
CN108959431B (en) Automatic label generation method, system, computer readable storage medium and equipment
US9779085B2 (en) Multilingual embeddings for natural language processing
CN111897970A (en) Text comparison method, device and equipment based on knowledge graph and storage medium
CN109992645A (en) A kind of data supervision system and method based on text data
CN106778878B (en) Character relation classification method and device
Kashmira et al. Generating entity relationship diagram from requirement specification based on nlp
CN113971210B (en) Data dictionary generation method and device, electronic equipment and storage medium
CN113448843A (en) Defect analysis-based image recognition software test data enhancement method and device
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN115757819A (en) Method and device for acquiring information of quoting legal articles in referee document
CN117807204A (en) Question-answering diagnosis method, device, equipment and medium for engineering machinery fault problems
CN111814476A (en) Method and device for extracting entity relationship
CN117291192B (en) Government affair text semantic understanding analysis method and system
CN112445862B (en) Internet of things equipment data set construction method and device, electronic equipment and storage medium
CN117745482A (en) Method, device, equipment and medium for determining contract clause
CN117971698A (en) Test case generation method and device, electronic equipment and storage medium
CN109684473A (en) A kind of automatic bulletin generation method and system
CN111382243A (en) Text category matching method, text category matching device and terminal
CN114169966B (en) Method and system for extracting unit data of goods by tensor
WO2023016163A1 (en) Method for training text recognition model, method for recognizing text, and apparatus
CN116975275A (en) Multilingual text classification model training method and device and computer equipment
CN111160756A (en) Scenic spot assessment method and model based on secondary artificial intelligence algorithm
CN110765872A (en) Online mathematical education resource classification method based on visual features
CN113688233A (en) Text understanding method for semantic search of knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant