CN117829954A - Commodity code matching method and device, electronic equipment and storage medium - Google Patents

Commodity code matching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117829954A
CN117829954A CN202311834006.4A CN202311834006A CN117829954A CN 117829954 A CN117829954 A CN 117829954A CN 202311834006 A CN202311834006 A CN 202311834006A CN 117829954 A CN117829954 A CN 117829954A
Authority
CN
China
Prior art keywords
commodity
target
data set
training data
code matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311834006.4A
Other languages
Chinese (zh)
Inventor
徐聪
邓应强
张航嘉
徐俊
张玉魁
黄傲雪
罗威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202311834006.4A priority Critical patent/CN117829954A/en
Publication of CN117829954A publication Critical patent/CN117829954A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a commodity code matching method, a commodity code matching device, electronic equipment and a storage medium, and relates to the technical field of big data, wherein the method comprises the following steps: firstly, obtaining target commodity information to be matched, and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics; then, training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristic to obtain a target commodity code matching model; and finally, matching the target commodity information to be matched by adopting a target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched. By the method, the model is trained based on the training data set with higher accuracy, so that accuracy of commodity code matching is improved, and workload of manual matching is reduced.

Description

Commodity code matching method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of big data, in particular to a commodity code matching method, a commodity code matching device, electronic equipment and a storage medium.
Background
The commodity code refers to that when the value-added tax invoice is issued, the commodity name on the invoice surface of the invoice is associated with the commodity code checked by the related departments. Commodity codes are beneficial to statistics, screening, analysis and comparison data of related departments, so that management is enhanced.
When tax handling is carried out, commodity codes corresponding to the commodity codes are generally searched according to commodity names, the currently published commodity codes reach 4300 more varieties, and if manual searching is adopted, the time and the labor are very consumed, and the accuracy is possibly low.
Therefore, the matching of commodity codes cannot be accurately and rapidly completed by adopting a manual matching mode. In the prior art, matching is typically performed according to a machine learning method based on historical billing data. However, with the continuous development, the commodities are increasingly abundant, and a lot of new commodities are also appeared, so that the names of the commodities which do not exist in the historical billing data are easy to be mismatched, and the matching accuracy rate is low under the condition that the historical billing data are less.
Therefore, how to accurately match the commodity name with the commodity code is a problem that needs to be solved at present.
Disclosure of Invention
The application provides a commodity code matching method, a commodity code matching device, electronic equipment and a storage medium, which are used for improving matching accuracy of commodity names and commodity codes.
In a first aspect, the present application provides a commodity code matching method, including:
acquiring target commodity information to be matched, and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics;
training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristic to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement;
and matching the target commodity information to be matched by adopting a target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched.
In an alternative embodiment, the target commodity code matching model is trained by:
acquiring historical commodity information;
preprocessing historical commodity information according to preset rules to obtain a target training data set;
extracting features of the target training data set by utilizing at least one learner in the commodity code matching model to obtain a reference feature vector corresponding to the target training data set; wherein, the structure of different learners is different;
determining a prediction matching result of the reference feature vector by adopting a plurality of serially connected transducer layers and FCN layers;
and adjusting parameters of the commodity coding matching model according to the prediction matching result until a preset condition is met, so as to obtain the target commodity coding matching model.
In an alternative embodiment, before preprocessing the historical commodity information according to a preset rule to obtain the target training data set, the method further includes:
acquiring matched commodity names and commodity codes;
the following operations are performed for the target commodity code:
determining the number of times of occurrence of the target commodity codes and the number of times of occurrence of different commodity names corresponding to the target commodity codes; wherein the target commodity code is any one of commodity codes;
determining a classification probability set corresponding to the target commodity code based on the ratio of the number of times of occurrence of the target commodity code in the total number of times of occurrence of all commodity codes and the ratio of the number of times of occurrence of each of different commodity names in the total number of times of occurrence of all commodity names; wherein the classification probability set includes a correct classification probability and a wrong classification probability.
And determining a training data set from the historical commodity information according to the classification probability set.
In an alternative embodiment, after determining the training data set from the matched commodity names and commodity codes according to the classification probability set, the method further comprises:
and carrying out synonym expansion on each commodity name in the training data set to obtain a target training data set.
In a second aspect, the present application provides a commodity code matching apparatus, including:
the acquisition module is used for acquiring target commodity information to be matched, and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics;
the training module is used for training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristic setting to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement;
and the matching module is used for matching the target commodity information to be matched by adopting the target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched.
In an alternative embodiment, the target commodity code matching model is obtained by training a training module through the following method:
acquiring historical commodity information;
preprocessing historical commodity information according to preset rules to obtain a target training data set;
extracting features of the target training data set by utilizing at least one learner in the commodity code matching model to obtain a reference feature vector corresponding to the target training data set; wherein, the structure of different learners is different;
determining a prediction matching result of the reference feature vector by adopting a plurality of serially connected transducer layers and FCN layers;
and adjusting parameters of the commodity coding matching model according to the prediction matching result until a preset condition is met, so as to obtain the target commodity coding matching model.
By the method, the commodity code matching model can be trained based on the characteristics of different dimensions, so that the matching accuracy of the target commodity code matching model is high.
In an optional implementation manner, before preprocessing the historical commodity information according to a preset rule to obtain a target training data set, the training module is further configured to:
acquiring matched commodity names and commodity codes;
the following operations are performed for the target commodity code:
determining the number of times of occurrence of the target commodity codes and the number of times of occurrence of different commodity names corresponding to the target commodity codes; wherein the target commodity code is any one of commodity codes;
determining a classification probability set corresponding to the target commodity code based on the ratio of the number of times of occurrence of the target commodity code in the total number of times of occurrence of all commodity codes and the ratio of the number of times of occurrence of each of different commodity names in the total number of times of occurrence of all commodity names; wherein the set of classification probabilities includes a correct classification probability and a wrong classification probability.
And determining a training data set from the historical commodity information according to the classification probability set.
By the method, the training data set with higher matching accuracy can be obtained, so that the training accuracy of the model is ensured.
In an alternative embodiment, after determining the training data set from the matched commodity names and commodity codes according to the classification probability set, the training module is further configured to:
and carrying out synonym expansion on each commodity name in the training data set to obtain a target training data set.
By the method, synonym expansion is carried out on commodity names, so that generalization capability of the target commodity coding matching model obtained through training can be enhanced, and different commodity names can be better identified.
In a third aspect, the present application provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the step of the commodity code matching method when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements the steps of a commodity code matching method as described above.
Through the technical scheme in the above-mentioned one or more embodiments of the present application, the embodiments of the present application have at least the following beneficial effects:
in the commodity code matching method provided by the embodiment of the application, first, target commodity information to be matched is obtained, and feature extraction is performed on the target commodity information to be matched to obtain target commodity features; then, training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristic to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement; and finally, matching the target commodity information to be matched by adopting a target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched.
By adopting the mode, the training data set for deep learning is determined according to the preset rule, the workload of manual marking is reduced on the premise of ensuring the accuracy, the accurate matching of commodity names and commodity codes can be realized, the method and the device can be applied to various application scenes, and the accuracy can be positively improved along with the increase of the data quantity.
The technical effects of each of the second to fourth aspects and the technical effects that may be achieved by each of the aspects are referred to above for the technical effects that may be achieved by each of the first aspect and the various possible aspects of the first aspect, and the detailed description is not repeated here.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of implementation of a commodity code matching method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a commodity code matching model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a commodity code matching device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, which can be made by a person of ordinary skill in the art without any inventive effort, based on the embodiments described in the present application are intended to be within the scope of the technical solutions of the present application.
It should be noted that "a plurality of" is understood as "at least two" in the description of the present application. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. A is connected with B, and can be represented as follows: both cases of direct connection of A and B and connection of A and B through C. In addition, in the description of the present application, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not be construed as indicating or implying a relative importance or order.
In the prior art, commodity coding matching is usually carried out according to a machine learning method based on historical billing data, however, with continuous development, commodities are increasingly enriched, and a plurality of new commodities are generated, so that the names of the commodities which do not exist in the historical billing data are easily subjected to wrong matching, and the matching accuracy is low under the condition that the historical billing data are less.
In view of this, an embodiment of the present application provides a commodity code matching method, including: firstly, obtaining target commodity information to be matched, and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics; then, training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristic to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement; and finally, matching the target commodity information to be matched by adopting a target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched. By the method, accuracy of commodity code matching is improved, workload of manual marking data is reduced, and the commodity code matching method is high in generalization capability and suitable for various application scenes.
Fig. 1 is a schematic diagram of an application scenario applicable to the embodiment of the present application. As shown in fig. 1, the scenario mainly includes a terminal 10 and a server 11. The information interaction between the terminal 10 and the server 11 may be performed through a communication network, where the communication manner adopted by the communication network may include: wireless communication and wired communication. The terminal device 10 in the embodiment of the present application may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like.
The server 11 in this embodiment of the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server of a cloud computing service, which is not limited herein.
It should be noted that the following description of the preferred embodiments of the present application is given by way of illustration and explanation only, and is not intended to limit the present application, and the features of the embodiments of the present application and the embodiments thereof may be combined with each other without conflict.
Referring to fig. 2, a schematic implementation flow chart of a commodity code matching method provided in an embodiment of the present application is shown, where a specific implementation flow chart of the method is as follows:
s1: acquiring target commodity information to be matched, and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics;
s2: and training the commodity code matching model to be trained based on the target training data set corresponding to the target commodity characteristic to obtain the target commodity code matching model.
S3: and matching the target commodity information to be matched by adopting a target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched.
In the embodiment of the application, first, target commodity information to be matched is obtained, and the target commodity information can represent commodity names of commodities needing to be invoiced. By way of example, the trade name may be a cell phone, a computer, clothing, etc. And then, extracting the characteristics of the target commodity information to be matched to obtain the characteristics of the target commodity. Optionally, in an embodiment of the present application, the feature extraction of the target commodity information is performed by using a full convolutional neural network.
And then training the commodity code matching model to be trained based on the training data set corresponding to the target commodity characteristic to obtain the target commodity code matching model.
In an alternative embodiment, the target commodity code matching model is trained by:
firstly, acquiring historical commodity information; the historical commodity information can be obtained from historical billing information by the server.
In an alternative embodiment, the following steps are performed before preprocessing the historical merchandise information according to a preset rule to obtain the target training data set:
firstly, acquiring matched commodity names and commodity codes; the matched commodity name and commodity code can be obtained according to historical billing information; then, the following operations are performed for the target commodity code:
and selecting commodity codes as rule statistics objects, taking the commodity codes as distinction, and counting the occurrence times of various commodity names under the target commodity names and the total occurrence times of the target commodity codes. The target commodity code is any commodity code, and a commodity code will be described below.
Thus, the total number of occurrences of the target commodity code and the number of occurrences of each of the different commodity names corresponding to the target commodity code can be determined. It will be appreciated that the same commodity code may correspond to a plurality of different commodity names, for example, the commodity corresponding to the commodity code a is a mobile phone, further, the mobile phone includes a millet mobile phone, a hua mobile phone, and the like, and thus, the commodity name corresponding to the commodity code a includes a millet mobile phone, a hua mobile phone, and the like. And further, the total number of occurrences of different commodity names corresponding to the commodity code A can be counted.
Further, after counting the total number of occurrences of each of the different commodity names, the commodity name whose total number of occurrences is only 1 is deleted. Then, based on the ratio of the number of occurrences of the target commodity code in the total number of occurrences of all commodity codes and the ratio of the number of occurrences of each of the different commodity names in the total number of occurrences of all commodity names, a classification probability set corresponding to the target commodity name is determined.
For example, different statistics rules are set according to different duty ratios, for example, the target commodity codes with the total occurrence frequency accounting for the first 2% are counted, then the duty ratio of the occurrence frequency of different commodity names corresponding to the target commodity codes in the total occurrence frequency of all commodity names is counted, the commodity codes corresponding to the target commodity names with the occurrence frequency accounting for more than 25% of the total occurrence frequency are stored, the target commodity codes corresponding to the target commodity names are considered to be correctly classified, and meanwhile, the target commodity codes corresponding to the commodity names with the occurrence frequency lower than 5% are stored as error classification, so that a classification probability set corresponding to the target commodity codes is obtained.
The target commodity codes with the total occurrence times accounting for 2% -5% of the total occurrence times can be counted, the proportion of the occurrence times of different commodity names corresponding to the target commodity codes in the total occurrence times of all commodity names is counted, the target commodity codes corresponding to the commodity names with the occurrence times accounting for more than 40% of the total occurrence times are stored, the target commodity codes corresponding to the target commodity codes are considered to be correctly classified, and the target commodity codes corresponding to the commodity names with the occurrence times lower than 5% are stored as error classification.
And counting the target commodity codes with the total occurrence frequency accounting for 50% -100% of the total occurrence frequency, counting the occurrence frequency of different commodity names corresponding to the target commodity codes in the total occurrence frequency of all commodity names, storing the target commodity codes corresponding to the commodity names with the occurrence frequency accounting for 100% of the total occurrence frequency as correct classification, and storing the target commodity codes corresponding to the commodity names with the occurrence frequency lower than 5% as error classification.
It should be noted that, the corresponding correct classification probability and error classification probability may also be set according to the other occurrence frequency duty ratio of the target commodity code, which is not described in detail herein.
Similarly, the trade name is called a rule statistics object, different condition rules are set according to different duty ratios, for example, the target commodity names with the total occurrence frequency accounting for the first 2% are counted, then the duty ratio of the occurrence frequency of different commodity codes corresponding to the target commodity names in the total occurrence frequency of all commodity codes is counted, the target commodity names corresponding to the commodity codes with the total occurrence frequency accounting for more than 25% are stored, the target commodity names corresponding to the commodity codes are considered to be correctly classified, and the target commodity names corresponding to the commodity codes with the occurrence frequency lower than 5% are stored as error classification.
After the classification probability set corresponding to the target commodity code is obtained, the original bill data is obtained from the historical commodity information, so that a training data set is determined.
In the training data set, a plurality of historical commodity information, such as commodity codes, commodity names, commodity unit prices, commodity units, commodity tax rates, and the like, which are strongly related to commodities are included.
In an alternative embodiment, after determining the training data set from the matched commodity names and commodity codes according to the classification probability set, synonym expansion is further required for each commodity name in the training data set to obtain a target training data set, and the obtained target training data set meets a preset matching accuracy requirement.
Specifically, for example, the synonyms of commodity names can be expanded by using the Google Distance technology, namely, the process of enriching the synonyms of single commodity names according to an external corpus is realized.
Further, preprocessing the historical commodity information according to a preset rule to obtain a target training data set.
The commodity tax rate is multiplied by 100 according to the corresponding value, so that the value of the commodity tax rate is between 0 and 100, for example, the commodity tax rate corresponding to the commodity a is 5%, i.e. 0.05, and then 0.05x100=5.
Coding commodity units according to different units, counting different categories of the units, and creating a one-dimensional array according to the categories, wherein in the one-dimensional array, each element corresponds to different units, the value of the unit is 1, and the value of the unit is not 0. For example, the unit corresponding to the commodity a is a table. Then n classes may be set, for example, the unit class is set to 5 classes. Thus, the corresponding units in a one-dimensional array may be ones, tables, bins, sheets, and vehicles. Then, the unit corresponding to the commodity a is a table, and the code corresponding to the unit of the commodity a is: 01000, if the unit corresponding to commodity B is a bin, the code corresponding to the unit of commodity B is: 00100.
the commodity unit price is multiplied by 100 according to the corresponding value so that the commodity unit price value is unified into an integer part. For example, if the commodity price corresponding to commodity a is 17.15 yuan, 17.15×100=1715 will be obtained.
In addition, the value of the commodity code is converted into a vector, wherein each commodity code can be converted into a vector by adopting a common word steering amount algorithm, the common word steering amount algorithm can be a one-hot encoding algorithm, and each commodity code can be converted into a corresponding vector according to the one-hot encoding algorithm.
The commodity name is processed according to word2vec distributed word vector representation method or TFIDF word frequency-inverse document frequency method, and the like, and is processed into a vector form.
By the method, the historical commodity information is preprocessed, and the target training data set is obtained.
In the embodiment of the application, a hybrid machine learning model is adopted as a commodity code matching model. The learner may apply a variety of different machine learning algorithms, such as data feature extraction using a plurality of Boost-type models. And then, splicing the results output by all learners to obtain a reference feature vector.
For example, in the embodiment of the present application, 3 Boost models may be set, including: XGBoost model, lightoost model, and catoost model. Different learners have different emphasis on feature extraction, so that a plurality of learners are adopted to extract features, and then feature splicing is carried out to obtain reference data features.
The types and the number of learners applied in the commodity code matching model are determined according to the data types in the training data set, and can be dynamically adjusted in the training process, and part of learners are added or deleted, so that the application is not limited.
By the method, as different learners have different structures, the characteristics with different emphasis points can be extracted, and the accuracy of matching by applying the commodity code matching model is improved.
Further, the obtained reference feature vector is subjected to deep training by using a multi-layer transducer structure, and finally a full convolution neural network layer (Fully Convolutional Networks, FCN) is accessed to output a prediction matching result of the reference feature vector.
The number of the transformers may be set according to factors such as the size of the training data set, the number of data diversity, the training duration requirement of the model, and the like, which are not limited herein.
In the embodiment of the application, the commodity code matching model can be evaluated and optimized through the verification data set, and parameters of the commodity code matching model are adjusted according to the prediction matching result until preset conditions are met, so that the target commodity code matching model is obtained. For example, the accuracy of the commodity code matching model may be determined from the verification data set, and when the accuracy is below a preset accuracy threshold, model parameters of the commodity code matching model may be adjusted. It should be noted that the accuracy threshold may be preset according to actual situations or experience, and the embodiments of the present application are not limited herein. And training the commodity code matching model by utilizing the target training data set to obtain a model which can be used for commodity code matching after training. With the increase of time, the number of data in the data set can be correspondingly increased, and the model can be updated by training again by using the incremental data, so that the matching is more accurate.
Referring to fig. 3, a schematic structural diagram of a commodity code matching model according to an embodiment of the present application is shown. As shown in fig. 3, 3 learners may be included in the commodity code matching model: XGBoost, lightBoost, catBoost. After commodity information to be matched is input into a commodity coding matching model, the data in the commodity coding matching model can be respectively subjected to feature extraction through 3 learners, and feature extraction results output by each learner are spliced to obtain a reference feature vector. And then performing depth training through a plurality of transformation layers so as to determine a prediction matching result of the reference feature vector. And then training a commodity code matching model based on the comparison result between the prediction matching result and the corresponding sample.
And finally, matching the target commodity information to be matched by adopting a trained commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched. For example, a commodity name a is input, and a corresponding commodity code B is matched.
Based on the same inventive concept, the embodiment of the present application further includes a commodity code matching device, as shown in fig. 4, where the device includes: an acquisition module 401, a training module 402, and a matching module 403;
the obtaining module 401 is configured to obtain target commodity information to be matched, and perform feature extraction on the target commodity information to be matched to obtain target commodity features;
the training module 402 is configured to train the commodity code matching model to be trained based on a target training data set corresponding to the target commodity feature to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement;
and the matching module 403 is configured to match the target commodity information to be matched by using the target commodity code matching model, so as to obtain a commodity code matching result corresponding to the target commodity information to be matched.
In an alternative embodiment, the target commodity code matching model is trained by the training module 402 by:
acquiring historical commodity information;
preprocessing historical commodity information according to preset rules to obtain a target training data set;
extracting features of the target training data set by utilizing at least one learner in the commodity code matching model to obtain a reference feature vector corresponding to the target training data set; wherein, the structure of different learners is different;
determining a prediction matching result of the reference feature vector by adopting a plurality of serially connected transducer layers and FCN layers;
and adjusting parameters of the commodity coding matching model according to the prediction matching result until a preset condition is met, so as to obtain the target commodity coding matching model.
In an alternative embodiment, before preprocessing the historical merchandise information according to a preset rule to obtain a target training data set, the training module 402 is further configured to:
acquiring matched commodity names and commodity codes;
the following operations are performed for the target commodity code:
determining the number of times of occurrence of the target commodity codes and the number of times of occurrence of different commodity names corresponding to the target commodity codes; wherein the target commodity code is any one of commodity codes;
determining a classification probability set corresponding to the target commodity code based on the ratio of the number of times of occurrence of the target commodity code in the total number of times of occurrence of all commodity codes and the ratio of the number of times of occurrence of each of different commodity names in the total number of times of occurrence of all commodity names; wherein the set of classification probabilities includes a correct classification probability and a wrong classification probability.
And determining a training data set from the historical commodity information according to the classification probability set.
In an alternative embodiment, after determining the training data set from the matched commodity names and commodity codes according to the classification probability set, the training module 402 is further configured to:
and carrying out synonym expansion on each commodity name in the training data set to obtain a target training data set.
Based on the same inventive concept, the embodiment of the present application further provides an electronic device, where the electronic device may implement the functions of the foregoing data processing method, and referring to fig. 5, the electronic device includes:
the embodiment of the present application does not limit the specific connection medium between the processor 501 and the memory 502, but the connection between the processor 501 and the memory 502 through the bus 500 is exemplified in fig. 5. The connection between the other components of bus 500 is shown in bold lines in fig. 5, and is merely illustrative and not limiting. Bus 500 may be divided into an address bus, a data bus, a control bus, etc., and is represented by only one thick line in fig. 5 for ease of illustration, but does not represent only one bus or one type of bus. Alternatively, the processor 501 may be referred to as a controller, and the names are not limited.
In the embodiment of the present application, the memory 502 stores instructions executable by the at least one processor 501, and the at least one processor 501 may perform the data processing method described above by executing the instructions stored in the memory 502. The processor 501 may implement the functions of the various modules in the apparatus shown in fig. 4.
The processor 501 is a control center of the device, and various interfaces and lines can be used to connect various parts of the entire control device, and by executing or executing instructions stored in the memory 502 and invoking data stored in the memory 502, various functions of the device and processing data can be performed to monitor the device as a whole.
In one possible design, processor 501 may include one or more processing units, and processor 501 may integrate an application processor and a modem processor, where the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501. In some embodiments, processor 501 and memory 502 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 501 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the data processing method disclosed in connection with the embodiments of the present application may be directly embodied as a hardware processor executing, or may be executed by a combination of hardware and software modules in the processor.
The memory 502, as a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 502 may include at least one type of storage medium, and may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 502 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 502 in the present embodiment may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data.
By programming the processor 501, the code corresponding to the data processing method described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the data processing method of the embodiment shown in fig. 2 at the time of operation. How to design and program the processor 501 is a technique well known to those skilled in the art, and will not be described in detail herein.
Based on the same inventive concept, the embodiments of the present application also provide a storage medium storing computer instructions that, when run on a computer, cause the computer to perform the data processing method discussed above.
In some possible embodiments, aspects of the data processing method provided herein may also be implemented in the form of a program product comprising program code for causing a control apparatus to carry out the steps of the data processing method according to the various exemplary embodiments of the present application as described herein above when the program product is run on an apparatus.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. A commodity code matching method, the method comprising:
acquiring target commodity information to be matched, and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics;
training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristics to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement;
and matching the target commodity information to be matched by adopting the target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched.
2. The method of claim 1, wherein the target commodity code matching model is trained by:
acquiring historical commodity information;
preprocessing the historical commodity information according to a preset rule to obtain a target training data set;
extracting features of the target training data set by using at least one learner in the commodity coding matching model to obtain a reference feature vector corresponding to the target training data set; wherein, the structure of different learners is different;
determining a prediction matching result of the reference feature vector by adopting a plurality of serially connected transformers layers and FCN layers;
and adjusting parameters of the commodity code matching model according to the prediction matching result until a preset condition is met, so as to obtain a target commodity code matching model.
3. The method of claim 2, further comprising, prior to said preprocessing said historical merchandise information according to a predetermined rule to obtain a target training data set:
acquiring matched commodity names and commodity codes;
the following operations are performed for the target commodity code:
determining the occurrence times of the target commodity codes and the occurrence times of different commodity names corresponding to the target commodity codes; wherein the target commodity code is any one of the commodity codes;
determining a classification probability set corresponding to the target commodity code based on the ratio of the times of occurrence of the target commodity code to the total times of occurrence of all commodity codes and the ratio of the times of occurrence of each of the different commodity names to the total times of occurrence of all commodity names; wherein the classification probability set comprises a correct classification probability and an error classification probability;
and determining a training data set from the historical commodity information according to the classification probability set.
4. The method of claim 3, further comprising, after said determining a training data set from the matched commodity names and commodity codes based on said set of classification probabilities:
and carrying out synonym expansion on each commodity name in the training data set to obtain the target training data set.
5. A commodity code matching apparatus, said apparatus comprising:
the acquisition module is used for acquiring target commodity information to be matched and extracting characteristics of the target commodity information to be matched to obtain target commodity characteristics;
the training module is used for training the commodity code matching model to be trained based on a target training data set corresponding to the target commodity characteristic to obtain a target commodity code matching model; the target training data set meets the preset commodity code matching accuracy requirement;
and the matching module is used for matching the target commodity information to be matched by adopting the target commodity code matching model to obtain a commodity code matching result corresponding to the target commodity information to be matched.
6. The apparatus of claim 5, wherein the target commodity code matching model is trained by the training module by:
acquiring historical commodity information;
preprocessing the historical commodity information according to a preset rule to obtain a target training data set;
extracting features of the target training data set by using at least one learner in the commodity coding matching model to obtain a reference feature vector corresponding to the target training data set; wherein, the structure of different learners is different;
determining a prediction matching result of the reference feature vector by adopting a plurality of serially connected transformers layers and FCN layers;
and adjusting parameters of the commodity code matching model according to the prediction matching result until a preset condition is met, so as to obtain a target commodity code matching model.
7. The apparatus of claim 6, wherein prior to said preprocessing of said historical merchandise information according to a predetermined rule to obtain a target training data set, said training model is further configured to:
acquiring matched commodity names and commodity codes;
the following operations are performed for the target commodity code:
determining the occurrence times of the target commodity codes and the occurrence times of different commodity names corresponding to the target commodity codes; wherein the target commodity code is any one of the commodity codes;
determining a classification probability set corresponding to the target commodity code based on the ratio of the times of occurrence of the target commodity code to the total times of occurrence of all commodity codes and the ratio of the times of occurrence of each of the different commodity names to the total times of occurrence of all commodity names; wherein the classification probability set comprises a correct classification probability and an error classification probability;
and determining a training data set from the historical commodity information according to the classification probability set.
8. The apparatus of claim 7, wherein after said determining a training data set from the matched commodity names and commodity codes according to the set of classification probabilities, the training module is further to:
and carrying out synonym expansion on each commodity name in the training data set to obtain the target training data set.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method of any of claims 1-4 when executing a computer program stored on said memory.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-4.
CN202311834006.4A 2023-12-28 2023-12-28 Commodity code matching method and device, electronic equipment and storage medium Pending CN117829954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311834006.4A CN117829954A (en) 2023-12-28 2023-12-28 Commodity code matching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311834006.4A CN117829954A (en) 2023-12-28 2023-12-28 Commodity code matching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117829954A true CN117829954A (en) 2024-04-05

Family

ID=90523804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311834006.4A Pending CN117829954A (en) 2023-12-28 2023-12-28 Commodity code matching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117829954A (en)

Similar Documents

Publication Publication Date Title
CN109493199A (en) Products Show method, apparatus, computer equipment and storage medium
CN111275491A (en) Data processing method and device
CN112328909B (en) Information recommendation method and device, computer equipment and medium
CN107797989A (en) Enterprise name recognition methods, electronic equipment and computer-readable recording medium
CN113590102B (en) Zero-code rapid software development method, system, medium and equipment
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN113220657B (en) Data processing method and device and computer equipment
CN109740642A (en) Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing
CN109690581A (en) User guided system and method
CN111680165A (en) Information matching method and device, readable storage medium and electronic equipment
CN113360768A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN115018588A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN106649210A (en) Data conversion method and device
CN112269875A (en) Text classification method and device, electronic equipment and storage medium
CN117829954A (en) Commodity code matching method and device, electronic equipment and storage medium
CN113837216B (en) Data classification method, training device, medium and electronic equipment
CN114021716A (en) Model training method and system and electronic equipment
CN110837596B (en) Intelligent recommendation method and device, computer equipment and storage medium
CN114511314A (en) Payment account management method and device, computer equipment and storage medium
CN114067343A (en) Data set construction method, model training method and corresponding device
CN112231546A (en) Heterogeneous document ordering method, heterogeneous document ordering model training method and device
CN111507366A (en) Training method of recommendation probability model, intelligent completion method and related device
US11669681B2 (en) Automated calculation predictions with explanations
CN115168607A (en) Entity relationship extraction method and related equipment
CN116777645A (en) Method for enhancing and optimizing risk prediction model of vehicle risk and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination