CN114547360A - Automatic marking method and device for material and storage medium - Google Patents

Automatic marking method and device for material and storage medium Download PDF

Info

Publication number
CN114547360A
CN114547360A CN202210170536.2A CN202210170536A CN114547360A CN 114547360 A CN114547360 A CN 114547360A CN 202210170536 A CN202210170536 A CN 202210170536A CN 114547360 A CN114547360 A CN 114547360A
Authority
CN
China
Prior art keywords
matching
description information
information
dictionary
successful
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210170536.2A
Other languages
Chinese (zh)
Inventor
王喆
范凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tezign Shanghai Information Technology Co Ltd
Original Assignee
Tezign Shanghai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tezign Shanghai Information Technology Co Ltd filed Critical Tezign Shanghai Information Technology Co Ltd
Priority to CN202210170536.2A priority Critical patent/CN114547360A/en
Publication of CN114547360A publication Critical patent/CN114547360A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for automatically marking a material and a storage medium, wherein the method for automatically marking the material comprises the following steps: constructing a dictionary library and a gallery for matching; acquiring a material provided by a user; reading first description information of the material, matching a dictionary library, judging whether the matching is successful, and if the matching is successful, acquiring label information corresponding to the first description information and giving the label information to the material; if the matching is unsuccessful, reading the character information in the material by an optical character recognition method, matching a dictionary library, judging whether the matching is successful, and if the matching is successful, acquiring label information corresponding to the character information and endowing the label information to the material; and if the matching is unsuccessful, carrying out gallery matching to obtain second description information of the material, carrying out dictionary gallery matching, judging whether the matching is successful, and if the matching is successful, obtaining label information corresponding to the second description information and endowing the label information to the material.

Description

Automatic marking method and device for material and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for automatically marking materials and a storage medium.
Background
With the coming of the media era, the marketing field content of fast-moving articles is rapidly increased, the management of distributed materials is particularly important when the content is distributed, the improvement of the distribution management of the materials is particularly important no matter the distribution management of the materials is operated on the content of the product or the content is re-created, and in order to optimize the experience of managing the materials by a user and improve the efficiency of using the materials by the user, a material marking tool is produced. The marking method in the prior art mainly comprises pure manual marking, classification marking of a machine through contents in different folder directories and training classification marking of the machine. The traditional pure manual marking method is time-consuming and labor-consuming, so that the method is not very suitable for large-batch material marking tasks; rules and folder names involved in the post-processing process are too many to cover completely through the marking mode of folder management; in the method for training and labeling by a machine, in order to meet the requirement of online precision, the training data to be labeled is overlarge, and when the granularity of the category is too fine, the model is difficult to be converged and inferred. Therefore, the existing marking method cannot meet the requirement of automatic marking of materials in the marketing field of fast-moving products.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the problem that the marking method in the prior art cannot meet the requirement of automatic marking of materials in the marketing field of fast-moving products.
In order to solve the technical problem, the invention provides an automatic marking method for a material, which comprises the following steps: constructing a dictionary library and a gallery for matching; acquiring a material provided by a user; reading first description information of the material, performing dictionary base matching on the first description information, judging whether the matching is successful, if the dictionary base matching of the first description information is successful, acquiring label information corresponding to the first description information and endowing the label information to the material, wherein the first description information comprises a file name and an attribute of the material and/or a folder name of the material; if the dictionary base matching of the first description information is unsuccessful, reading character information in the material by an optical character recognition method, performing dictionary base matching on the character information, judging whether the matching is successful, and if the dictionary base matching of the character information is successful, acquiring the label information corresponding to the character information and endowing the label information to the material; and if the dictionary base matching of the character information is unsuccessful, carrying out map base matching on the material to obtain second description information of the material, wherein the second description information is a text matched in the map base, carrying out dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, obtaining the label information corresponding to the second description information and endowing the label information to the material.
Optionally, after performing dictionary base matching on the second description information and determining whether the matching is successful, the method further includes: and if the dictionary base matching of the second description information is unsuccessful, nulling the label information of the material.
Optionally, the method for constructing a dictionary base for matching includes: acquiring third description information of a plurality of commodities, and creating the label information based on the third description information, wherein the third description information comprises names of the commodities and/or attributes of the commodities; respectively determining the label information of a plurality of commodities; and extracting trigger words from the third description information, and respectively configuring the trigger words to the corresponding label information so as to obtain the dictionary database.
Optionally, the method for constructing a gallery for matching includes: acquiring fourth description information and commodity images of a plurality of commodities, wherein the fourth description information comprises names of the commodities and/or attributes of the commodities, and the fourth description information comprises the trigger words; extracting a feature vector of the commodity image; and storing the index number of the commodity and the fourth description information into a mongodb database, and storing the index number of the commodity and the feature vector of the commodity image into a millius database to obtain the gallery.
Optionally, the method for performing dictionary base matching includes: and matching the first description information, the character information or the second description information with the trigger words in the dictionary base through an AC automatic machine algorithm to obtain the label information corresponding to the matched trigger words.
Optionally, the method for performing gallery matching includes: extracting the feature vectors of the materials, retrieving the feature vectors of the commodity images closest to the feature vectors of the materials in the milvus database, obtaining index numbers of the commodities corresponding to the feature vectors of the retrieved images, and obtaining the fourth description information corresponding to the mongodb database according to the index numbers of the commodities, wherein the obtained fourth description information is the second description information matched with the materials.
Optionally, the method for extracting the feature vector of the commodity image or the material includes: and extracting the characteristic vectors of the commodity images or the materials by adopting an image pre-training model.
In order to solve the above technical problem, the present invention further provides an automatic marking device for a material, including: a memory; and a processor coupled to the memory, the processor configured to: constructing a dictionary library and a gallery for matching; acquiring a material provided by a user; reading first description information of the material, performing dictionary base matching on the first description information, judging whether the matching is successful, if the dictionary base matching of the first description information is successful, acquiring label information corresponding to the first description information and endowing the label information to the material, wherein the first description information comprises a file name and an attribute of the material and/or a folder name of the material; if the dictionary base matching of the first description information is unsuccessful, reading character information in the material by an optical character recognition method, performing dictionary base matching on the character information, judging whether the matching is successful, and if the dictionary base matching of the character information is successful, acquiring the label information corresponding to the character information and endowing the label information to the material; and if the dictionary base matching of the character information is unsuccessful, carrying out map base matching on the material to obtain second description information of the material, wherein the second description information is a text matched in the map base, carrying out dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, obtaining the label information corresponding to the second description information and endowing the label information to the material.
In order to solve the above technical problem, the present invention further provides a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a machine, implements the steps of the method as described above.
The technical scheme of the invention has the following advantages:
1. the invention provides an automatic marking method for a material, which comprises the following steps: constructing a dictionary library and a gallery for matching; acquiring a material provided by a user; reading first description information of the material, performing dictionary base matching on the first description information, judging whether the matching is successful, if the dictionary base matching of the first description information is successful, acquiring label information corresponding to the first description information and endowing the label information to the material, wherein the first description information comprises a file name and an attribute of the material and/or a folder name of the material; if the dictionary base matching of the first description information is unsuccessful, reading character information in the material by an optical character recognition method, performing dictionary base matching on the character information, judging whether the matching is successful, and if the dictionary base matching of the character information is successful, acquiring the label information corresponding to the character information and endowing the label information to the material; and if the dictionary base matching of the character information is unsuccessful, carrying out map base matching on the material to obtain second description information of the material, wherein the second description information is a text matched in the map base, carrying out dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, obtaining the label information corresponding to the second description information and endowing the label information to the material.
By the automatic marking method of the series combination, the workload and the working pressure of marking are reduced, and the marking efficiency is improved. Specifically, the requirement of automatic marking is met and the workload of pure manual marking is reduced by constructing a dictionary library for matching; the method has the advantages that the method reads the character information in the material by an optical character recognition method, and performs dictionary base matching on the character information, so that folder information needing to be processed can be reduced, folder names are reduced, and the working pressure of regular writing marked by the folder names is relieved; by constructing the gallery for matching, the material images can be matched without performing a classification task, and machine marking training is not required to be performed on a large number of images. Therefore, the automatic marking method can meet the requirement of automatic marking of materials in the marketing field of fast-moving products.
2. Further, the method for constructing the gallery for matching comprises the following steps: acquiring fourth description information and commodity images of a plurality of commodities, wherein the fourth description information comprises names of the commodities and/or attributes of the commodities, and the fourth description information comprises the trigger words; extracting a feature vector of the commodity image; and storing the index number of the commodity and the fourth description information into a mongodb database, and storing the index number of the commodity and the feature vector of the commodity image into a millius database to obtain the gallery. The method for matching the gallery comprises the following steps: extracting the feature vectors of the materials, retrieving the feature vectors of the commodity images closest to the feature vectors of the materials in the milvus database, obtaining index numbers of the commodities corresponding to the feature vectors of the retrieved images, and obtaining the fourth description information corresponding to the mongodb database according to the index numbers of the commodities, wherein the obtained fourth description information is the second description information matched with the materials.
By acquiring the fourth description information and the commodity images of the commodities and ensuring that the fourth description information of all the commodities in the gallery contains the corresponding trigger words in the dictionary library, the gallery matching is used as the bottom-entering step of the whole marking method, the marking of massive training data of classified tasks is avoided, and the requirement of automatic marking of marketing field materials of fast-moving products can be better met.
3. Further, the method for extracting the feature vector of the commodity image or the material comprises the following steps: and extracting the characteristic vectors of the commodity images or the materials by adopting an image pre-training model. By training the open-source commodity data set by adopting the image pre-training model, the accuracy of image feature vector extraction is established, the recognition capability of the model for recognizing the same commodity and different shooting conditions (such as main and auxiliary images of the commodity, arrangement combinations of different elements and different shooting angles) is improved, and therefore the automatic marking requirement of the marketing field material of the fast-moving products can be better met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an automatic marking method for a material according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an image pre-training model training process according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an inference process of an image pre-training model according to an embodiment of the present invention;
fig. 4 is a block diagram of a module of an automatic marking device for materials according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
An embodiment of the present invention provides an automatic marking method for a material, and referring to fig. 1, fig. 1 is a flowchart of an automatic marking method for a material according to an embodiment of the present invention, it should be understood that the method may further include additional blocks not shown and/or may omit the shown blocks, and the scope of the present invention is not limited in this respect. The method comprises the following steps:
at step 101, a dictionary repository and a gallery for matching are constructed.
Specifically, based on data given by a user, a dictionary library and a gallery for matching are constructed, the dictionary library and the gallery are offline modules, the dictionary library and the gallery are constructed manually, and the dictionary library and the gallery can be completed by supplementing the user data periodically.
In some embodiments, the method of constructing a dictionary repository for matching includes: acquiring third description information of a plurality of commodities, and creating label information based on the third description information, wherein the label information comprises a plurality of primary labels and a plurality of secondary labels corresponding to the primary labels, and the third description information comprises names of the commodities and/or attributes of the commodities; respectively determining the label information of a plurality of commodities; and extracting trigger words from the third description information, and respectively configuring the trigger words to the corresponding label information so as to obtain the dictionary database.
Specifically, a large amount of label-free data is obtained from the user, the data includes information such as names and/or attributes corresponding to a large amount of commodities, that is, the third description information, through communication with the user, a metadata field that the user desires to mark is determined and created as tag information, and the metadata field is divided into a first-level tag and a second-level tag classified under the first-level tag, for example, "creative home" is determined as the first-level tag, and "floor mat type", "public class", "fitness type", "festival gift", "pillow type", "travel article", and "fragrance type" are determined as the second-level tag classified under "creative home", and further lower-level tags may be set under the second-level tag. Then, according to the third description information of different commodities, the label information of the different commodities is respectively determined, and a trigger word is extracted from the third description information of the different commodities, wherein the trigger word is a word with identification degree and representativeness. For example, the name of a certain commodity is a famous and creative good, the attribute is a fun holiday blind pocket of a plush doll house, the determined label information is a primary label of creative homes and a secondary label of doll classes, the trigger word extracted from the primary label is the plush doll, and then the trigger word is configured to the label information. And finally obtaining a dictionary base containing a large number of trigger words and label information corresponding to each trigger word.
In some embodiments, the method in which a gallery for matching is constructed comprises: acquiring fourth description information and commodity images of a plurality of commodities, wherein the fourth description information comprises names of the commodities and/or attributes of the commodities, and the fourth description information comprises the trigger words; extracting a feature vector of the commodity image; and storing the index number of the commodity and the fourth description information into a mongodb database, and storing the index number of the commodity and the feature vector of the commodity image into a millius database to obtain the gallery.
Specifically, the fourth description information is substantially consistent with the third description information, and therefore, the fourth description information includes the trigger word. In some embodiments, the method of extracting the feature vector of the commodity image includes: and extracting the characteristic vector of the commodity image by adopting an image pre-training model.
Referring to fig. 2, in the training process of the image pre-training model, an open source data set in the e-commerce field is obtained, the data set form contains different types of commodity maps, meanwhile, the system comprises publicity pictures with different dimensions of the same commodity, such as the same face washing instrument, a plurality of display pictures can be placed on the same commodity by a merchant in the display interfaces of Taobao and Jingdong, aiming at the open source data set, after data enhancement, inputting into multilayer convolution network layer, performing dimension conversion via linear layer to obtain image feature vector, calculating the image feature vector in loss module, for different commodities, a cross entropy loss calculation formula is adopted to obtain loss 1, for different display graphs of the same commodity, triple loss calculation is adopted to obtain loss2, the final loss is obtained by overlapping the loss 1 and the loss2, and an optimized image pre-training model is obtained through back propagation and loss optimization.
Cross entropy loss calculation formula:
Figure BDA0003517954330000071
where M denotes the number of categories, yicRepresenting a symbolic function, the true class of sample i belongs to c taking 1, otherwise 0, picRepresenting the predicted probability that the observed sample i belongs to class c.
Triple loss calculation formula: loss2 ═ max (d (a, p) -d (a, n) + margin, 0)
a: anchor, representing an anchor point; p: positive, a sample of the same class as a; n: negative, a sample of a different class than a; margin is a constant greater than 0. The final optimization goal is to zoom in on a and p and zoom out on a and n, d (a, p) represents the distance between a and p, and d (a, n) represents the distance between a and n.
Referring to fig. 3, in the image pre-training model inference process, an optimized image pre-training model is adopted, only a feature vector extraction module is used to extract feature vectors of all commodity images in an internal gallery, and the gallery is constructed by extracting feature vectors of massive commodity image data through the image model inference process.
At step 102, user-provided material is obtained.
Specifically, the material is a picture material, the material may include a file name and attribute information, and a plurality of materials may be provided together in one folder.
At step 103, first description information of the material is read, wherein the first description information comprises a file name, an attribute and/or a folder name of the material.
At step 104, dictionary base matching is performed on the first description information, and whether matching is successful is judged. If the matching is successful, step 105 is executed to obtain the tag information corresponding to the first description information and assign the tag information to the material. Specifically, the step of giving the label information to the material refers to the step of returning the label information output by the dictionary database to the requester as a result, completing marking of the material, and displaying the marking result to a front-end computer page.
In some embodiments, the method of dictionary repository matching includes: and matching the first description information with the trigger words in the dictionary base through an AC automaton algorithm to obtain the label information corresponding to the matched trigger words.
If the dictionary base matching of the first description information is not successful, step 106 is executed to read the text information in the material by an optical character recognition method.
Specifically, if there is no matching result output from the dictionary base matching, it means that the matching is not successful, possibly because the material has no valid file name, attribute or folder name information, that is, the information may be some messy code information or no information containing the material itself, and then the next execution is started.
In step 107, dictionary base matching is performed on the text information, and whether matching is successful is determined. Referring to the foregoing embodiment, if the matching is successful, step 108 is executed to obtain the tag information corresponding to the text information and assign the tag information to the material. The dictionary database matching method refers to the foregoing embodiments, and is not described herein.
If the dictionary base matching of the character information is unsuccessful, step 109 is executed to perform gallery matching on the material to obtain second description information of the material, where the second description information is a text matched in the gallery. Specifically, if the dictionary base matching of the text information is not successful, it may be because the user uploads a pure picture without any text information.
In some embodiments, the method of gallery matching includes: extracting the feature vectors of the materials, retrieving the feature vectors of the commodity images closest to the feature vectors of the materials in the milvus database, obtaining index numbers of the commodities corresponding to the feature vectors of the retrieved images, and obtaining the fourth description information corresponding to the mongodb database according to the index numbers of the commodities, wherein the obtained fourth description information is the second description information matched with the materials.
Specifically, an optimized image pre-training model is adopted, only a feature extraction module is used for extracting feature vectors of an uploaded material, then an approximate material vector is searched in a milvus database, the index number of a commodity corresponding to the material is returned, and then fourth description information corresponding to the index number, namely the name and/or attribute information of the commodity corresponding to the material, is searched in a mongodb database.
By training the open-source commodity data set by adopting the image pre-training model, the accuracy of image feature vector extraction is established, the recognition capability of the model for recognizing the same commodity and different shooting conditions (such as main and auxiliary images of the commodity, arrangement combinations of different elements and different shooting angles) is improved, and therefore the automatic marking requirement of the marketing field material of the fast-moving products can be better met.
And step 110, performing dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, performing step 111, obtaining the label information corresponding to the second description information and endowing the label information to the material. The dictionary database matching method refers to the foregoing embodiments, and is not described herein.
By acquiring the fourth description information and the commodity images of the commodities and ensuring that the fourth description information of all the commodities in the gallery contains the corresponding trigger words in the dictionary library, the gallery matching is used as the bottom-entering step of the whole marking method, the marking of massive training data of classified tasks is avoided, and the requirement of automatic marking of marketing field materials of fast-moving products can be better met.
If the dictionary base matching of the second description information is not successful, step 112 is executed to blank the tag information of the material. And finally sent to manual review.
By the automatic marking method of the series combination, the workload and the working pressure of marking are reduced, and the marking efficiency is improved. Specifically, the requirement of automatic marking is met and the workload of pure manual marking is reduced by constructing a dictionary library for matching; the method has the advantages that the method reads the character information in the material by an optical character recognition method, and performs dictionary base matching on the character information, so that folder information needing to be processed can be reduced, folder names are reduced, and the working pressure of regular writing marked by the folder names is relieved; by constructing the gallery for matching, the material images can be matched without performing a classification task, and machine marking training is not required to be performed on a large number of images. Therefore, the automatic marking method can meet the requirement of automatic marking of materials in the marketing field of fast-moving products.
Fig. 4 is a block diagram of a module of an automatic marking device for materials according to an embodiment of the present invention. The device includes:
a memory 201; and a processor 202 coupled to the memory 201, the processor 202 configured to: constructing a dictionary library and a gallery for matching; acquiring a material provided by a user; reading first description information of the material, performing dictionary base matching on the first description information, judging whether the matching is successful, if the dictionary base matching of the first description information is successful, acquiring label information corresponding to the first description information and endowing the label information to the material, wherein the first description information comprises a file name and an attribute of the material and/or a folder name of the material; if the dictionary base matching of the first description information is unsuccessful, reading character information in the material by an optical character recognition method, performing dictionary base matching on the character information, judging whether the matching is successful, and if the dictionary base matching of the character information is successful, acquiring the label information corresponding to the character information and endowing the label information to the material; and if the dictionary base matching of the character information is unsuccessful, carrying out map base matching on the material to obtain second description information of the material, wherein the second description information is a text matched in the map base, carrying out dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, obtaining the label information corresponding to the second description information and endowing the label information to the material.
In some embodiments, the processor 202 is further configured to: and if the dictionary base matching of the second description information is unsuccessful, nulling the label information of the material.
In some embodiments, the processor 202 is further configured to: acquiring third description information of a plurality of commodities, and creating label information based on the third description information, wherein the label information comprises a plurality of primary labels and a plurality of secondary labels corresponding to the primary labels, and the third description information comprises names of the commodities and/or attributes of the commodities; respectively determining the label information of a plurality of commodities; and extracting trigger words from the third description information, and respectively configuring the trigger words to the corresponding label information so as to obtain the dictionary database.
In some embodiments, the processor 202 is further configured to: acquiring fourth description information and commodity images of a plurality of commodities, wherein the fourth description information comprises names of the commodities and/or attributes of the commodities, and the fourth description information comprises the trigger words; extracting a feature vector of the commodity image; and storing the index number of the commodity and the fourth description information into a mongodb database, and storing the index number of the commodity and the feature vector of the commodity image into a millius database to obtain the gallery.
In some embodiments, the processor 202 is further configured to: and matching the first description information, the character information or the second description information with the trigger words in the dictionary base through an AC automatic machine algorithm to obtain the label information corresponding to the matched trigger words.
In some embodiments, the processor 202 is further configured to: extracting the feature vectors of the materials, retrieving the feature vectors of the commodity images closest to the feature vectors of the materials in the milvus database, obtaining index numbers of the commodities corresponding to the feature vectors of the retrieved images, and obtaining the fourth description information corresponding to the mongodb database according to the index numbers of the commodities, wherein the obtained fourth description information is the second description information matched with the materials.
In some embodiments, the processor 202 is further configured to: and extracting the characteristic vectors of the commodity images or the materials by adopting an image pre-training model.
For the specific implementation method, reference is made to the foregoing method embodiments, which are not described herein again.
The present invention may be methods, apparatus, systems and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therein for carrying out aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It is noted that, unless expressly stated otherwise, all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features. Where used, further, preferably, still further and more preferably is a brief introduction to the description of the other embodiment based on the foregoing embodiment, the combination of the contents of the further, preferably, still further or more preferably back strap with the foregoing embodiment being a complete construction of the other embodiment. Several further, preferred, still further or more preferred arrangements of the belt after the same embodiment may be combined in any combination to form a further embodiment.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are given by way of example only and are not limiting of the invention. The objects of the invention have been fully and effectively accomplished. The functional and structural principles of the present invention have been shown and described in the examples, and any variations or modifications of the embodiments of the present invention may be made without departing from the principles.

Claims (9)

1. An automatic marking method for materials is characterized by comprising the following steps:
constructing a dictionary library and a gallery for matching;
acquiring a material provided by a user;
reading first description information of the material, performing dictionary base matching on the first description information, judging whether the matching is successful, if the dictionary base matching of the first description information is successful, acquiring label information corresponding to the first description information and endowing the label information to the material, wherein the first description information comprises a file name and an attribute of the material and/or a folder name of the material;
if the dictionary base matching of the first description information is unsuccessful, reading character information in the material by an optical character recognition method, performing dictionary base matching on the character information, judging whether the matching is successful, and if the dictionary base matching of the character information is successful, acquiring the label information corresponding to the character information and endowing the label information to the material;
and if the dictionary base matching of the character information is unsuccessful, carrying out map base matching on the material to obtain second description information of the material, wherein the second description information is a text matched in the map base, carrying out dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, obtaining the label information corresponding to the second description information and endowing the label information to the material.
2. The automatic marking method for materials as claimed in claim 1, wherein after matching the second description information with a dictionary database and determining whether the matching is successful, the method further comprises the steps of:
and if the dictionary base matching of the second description information is unsuccessful, nulling the label information of the material.
3. The automatic marking method for materials as claimed in claim 1, wherein the method for constructing a dictionary database for matching comprises:
acquiring third description information of a plurality of commodities, and creating the label information based on the third description information, wherein the third description information comprises names of the commodities and/or attributes of the commodities;
respectively determining the label information of a plurality of commodities;
and extracting trigger words from the third description information, and respectively configuring the trigger words to the corresponding label information so as to obtain the dictionary database.
4. The automatic marking method for materials as claimed in claim 3, wherein the method for constructing the gallery for matching comprises:
acquiring fourth description information and commodity images of a plurality of commodities, wherein the fourth description information comprises names of the commodities and/or attributes of the commodities, and the fourth description information comprises the trigger words;
extracting a feature vector of the commodity image;
and storing the index number of the commodity and the fourth description information into a mongodb database, and storing the index number of the commodity and the feature vector of the commodity image into a millius database to obtain the gallery.
5. The automatic marking method for materials as claimed in claim 3, wherein the method for matching dictionary database comprises: and matching the first description information, the character information or the second description information with the trigger words in the dictionary base through an AC automatic machine algorithm to obtain the label information corresponding to the matched trigger words.
6. The automatic marking method for materials as claimed in claim 4, wherein the method for matching the gallery comprises: extracting the feature vectors of the materials, retrieving the feature vectors of the commodity images closest to the feature vectors of the materials in the milvus database, obtaining index numbers of the commodities corresponding to the feature vectors of the retrieved images, and obtaining the fourth description information corresponding to the mongodb database according to the index numbers of the commodities, wherein the obtained fourth description information is the second description information matched with the materials.
7. The automatic marking method for the material as claimed in claim 6, wherein the method for extracting the feature vector of the commodity image or the material comprises: and extracting the characteristic vectors of the commodity images or the materials by adopting an image pre-training model.
8. An automatic marking device of material, its characterized in that includes:
a memory; and
a processor coupled to the memory, the processor configured to:
constructing a dictionary library and a gallery for matching;
acquiring a material provided by a user;
reading first description information of the material, performing dictionary base matching on the first description information, judging whether the matching is successful, if the dictionary base matching of the first description information is successful, acquiring label information corresponding to the first description information and endowing the label information to the material, wherein the first description information comprises a file name and an attribute of the material and/or a folder name of the material;
if the dictionary base matching of the first description information is unsuccessful, reading character information in the material by an optical character recognition method, performing dictionary base matching on the character information, judging whether the matching is successful, and if the dictionary base matching of the character information is successful, acquiring the label information corresponding to the character information and endowing the label information to the material;
and if the dictionary base matching of the character information is unsuccessful, carrying out map base matching on the material to obtain second description information of the material, wherein the second description information is a text matched in the map base, carrying out dictionary base matching on the second description information, judging whether the matching is successful, and if the dictionary base matching of the second description information is successful, obtaining the label information corresponding to the second description information and endowing the label information to the material.
9. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a machine, performs the steps of the method of any of claims 1-7.
CN202210170536.2A 2022-02-24 2022-02-24 Automatic marking method and device for material and storage medium Pending CN114547360A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210170536.2A CN114547360A (en) 2022-02-24 2022-02-24 Automatic marking method and device for material and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210170536.2A CN114547360A (en) 2022-02-24 2022-02-24 Automatic marking method and device for material and storage medium

Publications (1)

Publication Number Publication Date
CN114547360A true CN114547360A (en) 2022-05-27

Family

ID=81677179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210170536.2A Pending CN114547360A (en) 2022-02-24 2022-02-24 Automatic marking method and device for material and storage medium

Country Status (1)

Country Link
CN (1) CN114547360A (en)

Similar Documents

Publication Publication Date Title
CN107679039B (en) Method and device for determining statement intention
CN106250385B (en) System and method for automated information abstraction processing of documents
CN107705066B (en) Information input method and electronic equipment during commodity warehousing
CN113283551B (en) Training method and training device of multi-mode pre-training model and electronic equipment
CN112949415B (en) Image processing method, apparatus, device and medium
EP3926531B1 (en) Method and system for visio-linguistic understanding using contextual language model reasoners
CN113469067B (en) Document analysis method, device, computer equipment and storage medium
CN111522901B (en) Method and device for processing address information in text
CN114357117A (en) Transaction information query method and device, computer equipment and storage medium
CN115658955B (en) Cross-media retrieval and model training method, device, equipment and menu retrieval system
CN115687647A (en) Notarization document generation method and device, electronic equipment and storage medium
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN111125550A (en) Interest point classification method, device, equipment and storage medium
CN111133429A (en) Extracting expressions for natural language processing
CN113254814A (en) Network course video labeling method and device, electronic equipment and medium
US11972625B2 (en) Character-based representation learning for table data extraction using artificial intelligence techniques
CN113297485B (en) Method for generating cross-modal representation vector and cross-modal recommendation method
CN115130437A (en) Intelligent document filling method and device and storage medium
CN114579876A (en) False information detection method, device, equipment and medium
CN111723177B (en) Modeling method and device of information extraction model and electronic equipment
CN114067343A (en) Data set construction method, model training method and corresponding device
CN114547360A (en) Automatic marking method and device for material and storage medium
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
CN113392312A (en) Information processing method and system and electronic equipment
Bastida et al. Multimodal object recognition using deep learning representations extracted from images and smartphone sensors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination