CN112348604B - Invoice commodity code assignment method, system, device and readable storage medium - Google Patents
Invoice commodity code assignment method, system, device and readable storage medium Download PDFInfo
- Publication number
- CN112348604B CN112348604B CN202011346801.5A CN202011346801A CN112348604B CN 112348604 B CN112348604 B CN 112348604B CN 202011346801 A CN202011346801 A CN 202011346801A CN 112348604 B CN112348604 B CN 112348604B
- Authority
- CN
- China
- Prior art keywords
- word
- matching
- goods
- core
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000011218 segmentation Effects 0.000 claims abstract description 141
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 85
- 238000000605 extraction Methods 0.000 claims abstract description 47
- 239000002131 composite material Substances 0.000 claims abstract description 18
- 150000001875 compounds Chemical group 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000004140 cleaning Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 26
- 239000008213 purified water Substances 0.000 description 7
- MHAJPDPJQMAIIY-UHFFFAOYSA-N Hydrogen peroxide Chemical compound OO MHAJPDPJQMAIIY-UHFFFAOYSA-N 0.000 description 6
- 229910052500 inorganic mineral Inorganic materials 0.000 description 4
- 239000011707 mineral Substances 0.000 description 4
- 235000014214 soft drink Nutrition 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/04—Billing or invoicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/381—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Finance (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses an invoice commodity code assignment method, a system, a device and a computer readable storage medium, comprising the following steps: receiving a cargo name; performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result; matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock; and outputting the matching result with the highest confidence. According to the application, core words are extracted by utilizing a plurality of composite algorithms, the hit rate of matching is improved, a plurality of matching results are obtained, and finally, the matching result with the highest confidence is selected from the plurality of matching results by utilizing the confidence, so that the accuracy of the final result is ensured.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, a system, an apparatus, and a computer readable storage medium for assigning invoice commodity codes.
Background
When an enterprise issues an invoice, goods and services can be classified into 4000 categories according to a tax classification coding table of the national tax administration. Users who are not familiar with tax classification code tables usually fill in according to experience, often have the condition that commodity codes are wrongly filled in, and once the mistakes occur, unnecessary losses are likely to be caused. It is therefore necessary to devise an algorithm that can classify the names of the goods filled by the user into the most suitable goods codes through a series of calculations.
The algorithm in the prior art requires the user to accurately input the commodity name, so that the corresponding commodity code can be found in a pre-constructed commodity name library, but some can be found in the library and some can be hardly found because different invoicers have different invoicing habits. For example, "farmer spring mineral water" is prescribed by some enterprises, but some enterprises may prescribe "farmer spring mineral water 500ml", "1.5L farmer spring mineral water", etc., and the former "farmer spring mineral water" may be found but the latter two kinds may not be found by using the commodity warehouse
Therefore, a more flexible and efficient invoice commodity code assignment method is needed.
Disclosure of Invention
Accordingly, the present application is directed to a method, system, apparatus and computer readable storage medium for assigning invoice commodity codes, which is more flexible and efficient. The specific scheme is as follows:
an invoice commodity code assignment method, comprising:
receiving a cargo name;
performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock;
outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
Optionally, the process of receiving the cargo name includes:
receiving an original cargo name;
and cleaning the original cargo name, and removing the useless words by using a preset useless word bank to obtain the cargo name.
Optionally, the process of matching the compound core word extraction algorithm, the full-mode word segmentation result and the accurate-mode word segmentation result in the core word bank to obtain a plurality of matching results includes:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
and matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
Optionally, the method further comprises:
receiving commodity coding abbreviations;
the process of matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a composite core word extraction algorithm to obtain a plurality of matching results comprises the following steps:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
and matching the word segmentation result in the full mode and/or the accurate mode in the core word stock by using a word segmentation algorithm for short, the commodity code for short and/or the accurate mode word segmentation result to obtain a word matching result for short.
The application also discloses an invoice commodity code assignment system, which comprises:
the goods name receiving module is used for receiving the goods name;
the bargain word segmentation module is used for utilizing the bargain word segmentation and a preset core word stock to segment the goods name to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module is used for matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by utilizing a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence coefficient calculating module is used for calculating the confidence coefficient of each matching result by utilizing the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word library;
the result output module is used for outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
Optionally, the cargo name receiving module includes:
the original name receiving unit is used for receiving the original goods name;
and the original name cleaning unit is used for cleaning the original cargo name, and removing the useless words by utilizing a preset useless word bank to obtain the cargo name.
Optionally, the core word extraction module includes:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
and the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
Optionally, the method further comprises:
the code abbreviation receiving module is used for receiving commodity code abbreviations;
the core word extraction module comprises:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
the word computing unit is used for matching the word computing unit in the core word stock by utilizing a word computing algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results.
The application also discloses an invoice commodity code assignment method, which comprises the following steps:
a memory for storing a computer program;
and a processor for executing the computer program to implement the invoice commodity code assignment as described above.
The application also discloses a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the invoice commodity code assignment when being executed by a processor.
The application discloses an invoice commodity code assignment method, which comprises the following steps: receiving a cargo name; performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result; matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms; calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock; outputting a matching result with highest confidence coefficient; the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
According to the application, core words are extracted by utilizing a plurality of composite algorithms, the hit rate of matching is improved, a plurality of matching results are obtained, and finally, the matching result with the highest confidence is selected from the plurality of matching results by utilizing the confidence, so that the accuracy of the final result is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a storage method for a dock container mirror image disclosed in an embodiment of the present application;
FIG. 2 is a schematic flow chart of another method for storing a dock container image according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a mirror image directional pulling method of a dock container disclosed in an embodiment of the present application;
fig. 4 is a schematic diagram of a mirror image directional pulling flow of a docker container according to another embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application discloses an invoice commodity code assignment method, which is shown in fig. 1 and comprises the following steps:
s11: receiving a cargo name;
s12: and performing word segmentation on the cargo name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result.
Specifically, after receiving a cargo name input by a user, performing word segmentation on the cargo name by using a resultant word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result, for example, the full-mode word segmentation result and the accurate-mode word segmentation result are respectively marked as a cut_all_result and a cut_result, the cargo name input by the user is assumed to be "pure water of a farmer mountain spring", three core words of the "pure water of the farmer mountain spring", "the farmer mountain spring" and "pure water" are recorded in the core word stock, and therefore, three results, namely "the farmer mountain spring", "pure water" and "pure water", can be obtained by performing word segmentation on the cargo name "the farmer mountain spring pure water" which is completely the same as the cargo name, the "farmer mountain spring" and the "pure water" are marked as partial cargo names are marked as the full-mode word segmentation result, and the cut_all_result is obtained: the results of [ 'farmer mountain spring', 'purified water', 'pure water', 'and cut_result [' farmer mountain spring purified water ','.
Specifically, if the bargain word segmentation cannot find the word segmentation result of the cargo name in the core word stock, the reason may be that the cargo name input by the user is wrong or the information of the related cargo name is not recorded in the core word stock, the finally output full-mode word segmentation result and the accurate-mode word segmentation result are both empty, and the subsequent matching process can be terminated.
It can be understood that if the information of the related cargo name is not recorded in the core word stock, the information can be added later according to the actual application requirement.
S13: and matching the compound core word extraction algorithm, the full-mode word segmentation result and the accurate-mode word segmentation result in a core word stock to obtain a plurality of matching results.
Specifically, the compound core word extraction algorithm includes a plurality of core word extraction algorithms, for example, an ending word algorithm, a unique word algorithm, a word simplified algorithm and the like, the core word extraction algorithm is utilized to continuously match core words in a core word stock based on a full-mode word segmentation result and an accurate-mode word segmentation result obtained by the previous ending word algorithm, so that the matching result output by the core word extraction algorithm can be obtained, each core word extraction algorithm can output a plurality of matching results or only one matching result, and of course, based on the accuracy of a goods name input by a user, part of core word extraction algorithms can not output an effective matching result, namely, the output matching result is null, but other core word extraction algorithms are not influenced, and the finally output matching result is not influenced.
It can be understood that if all the core word extraction algorithms output a null, the final matching result is null, and the reason may be that the goods name input by the user is wrong or the information of the related goods name is not recorded in the core word stock. If the information of the related goods names is not recorded in the core word stock, the information can be added later according to the actual application requirements.
S14: and calculating the confidence coefficient of each matching result by using the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word stock.
Specifically, because the composite core word extraction algorithm is adopted, a plurality of matching results are obtained, and in order to output unique and accurate commodity codes corresponding to the names of cargoes, corresponding weighting duty ratios are preset for each core word extraction algorithm in the composite core word extraction algorithm, so that each matching result corresponds to the corresponding weighting duty ratio.
Finally, the confidence coefficient of each matching result can be calculated by using the number proportion of goods and commodity code opening companies in each matching result and the preset weighting proportion, for example, the matching results of three algorithms are { 'purified water-1030307040000000000': 90}, { 'farmer mountain spring purified water-1030307040000000000': 60} and { 'purified water-1030307040000000000': 90}, wherein the text part such as "purified water" is a goods name, the number part such as "1030307040000000000" is a commodity code, the number such as "90" is the number proportion of goods and commodity code opening companies, and the calculation result by using the weighting proportion can be: { 'pure water-1030307040000000000': 90}, 0.2+ { 'pure water of farmer mountain spring-1030307040000000000': 60}, 0.3+ { 'pure water-1030307040000000000': 90}, 0.5 = { 'pure water-1030307040000000000': 63, 'pure water of farmer mountain spring-1030307040000000000': 18}, wherein numbers such as "0.2" and "0.5" are the weighted duty ratios of each algorithm, in the above example, the confidence of the product name of "pure water" is calculated, the product name of "1030307040000000000" is 63, the product name of "pure water of farmer mountain spring" is calculated, and the confidence of the product name of "1030307040000000000" is 18.
It should be noted that, the corresponding relation between the goods name and the goods code is pre-constructed in the core word stock, so after the matching result is obtained in the core word stock, the corresponding goods code and the number ratio of the goods code issuing companies can be obtained, see the core word stock shown in table 1.
TABLE 1
S15: and outputting the matching result with the highest confidence.
Specifically, after the confidence coefficient is calculated, a matching result with the highest confidence coefficient can be output, and the commodity code corresponding to the commodity name initially input by the user can be obtained.
Therefore, the embodiment of the application extracts the core words by utilizing a plurality of composite algorithms, performs matching, improves the hit rate of the matching, obtains a plurality of matching results, finally selects the matching result with the highest confidence from the matching results by utilizing the confidence, and ensures the accuracy of the final result.
Specifically, when the core word stock is created, the names of goods input into the core word stock are cleaned, stop words are removed, the accuracy of the names of the goods is ensured, interference information is reduced, efficiency of a subsequent core extraction algorithm in extracting the core words is improved, meanwhile, the names of the goods with the number of goods and goods issuing companies lower than a certain threshold value can be removed, the quantity of the core words is reduced, so that the subsequent extraction speed is improved, for example, data with the number of the issuing companies lower than 5 can be removed, for example, "hydrogen peroxide" in table 1, the number of the issuing companies is only 1, the hydrogen peroxide can be removed, and in addition, the goods and goods with the number of the goods and goods encoding issuing companies accounting for more than 0.1% can be selected, so that the hydrogen peroxide in table 1 is low in number of issuing companies, but the number of the goods and goods encoding issuing companies meets the requirements because of the total quantity of the issuing companies is low, and can be still stored in the core word stock.
The embodiment of the application discloses a specific invoice commodity code assignment method, and compared with the previous embodiment, the technical scheme of the embodiment is further described and optimized. See fig. 2 for details:
s21: receiving an original cargo name;
s22: and cleaning the original cargo name, and removing the useless words by using a preset useless word bank to obtain the cargo name.
Specifically, because the original goods names input by the user have the problem of inaccuracy, the original goods names input by the user can be cleaned, and the unnecessary words are removed from the original goods names through a preset unnecessary word bank and a corresponding cleaning algorithm, so that the goods names are obtained.
For example, the original goods are named as 500ml of special-price farmer mountain spring purified water, and the goods obtained after washing are named as farmer mountain spring purified water, so that two unnecessary words, namely special price and 500ml, are removed, and the subsequent word segmentation precision and the subsequent matching efficiency are improved.
S23: the goods names are segmented by using the bargain segmentation and a preset core word stock, and a full-mode segmentation result and an accurate-mode segmentation result are obtained;
s24: and matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in a core word stock to obtain an end word matching result.
Specifically, the compound core word extraction algorithm may include an end word algorithm, where the end word algorithm first determines whether the cargo name ends with some words in the full-mode word segmentation result, if so, outputs the word serving as the end word in the full-mode word segmentation result as an end word matching result, for example, the cargo name is "farmer mountain spring pure water", the full-mode word segmentation result is "farmer mountain spring" and "pure water", the "pure water" is the end word, and if not, determines whether the number of words in the precise-mode word segmentation result is greater than 1, if so, continues to determine whether the last word in the precise-mode word segmentation result ends with some words in the full-mode word segmentation result, if so, outputs the word serving as the end word matching result, and if not, the end word matching result is empty.
S25: and matching the unique word algorithm with the accurate pattern word segmentation result in a core word stock to obtain a unique word matching result.
Specifically, a unique word algorithm is utilized to judge whether the word in the accurate mode word segmentation result is unique, if so, the word is used as a unique word matching result, and if not, the output result is null.
S26: calculating the confidence coefficient of each matching result by using the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word stock;
s27: and outputting the matching result with the highest confidence.
Further, the embodiment of the application also discloses an invoice commodity code assignment method, which is shown in fig. 3 and comprises the following steps:
s31: receiving an original goods name and commodity coding abbreviation;
s32: cleaning the original cargo name, and removing useless words by using a preset useless word bank to obtain the cargo name;
s33: the goods names are segmented by using the bargain segmentation and a preset core word stock, and a full-mode segmentation result and an accurate-mode segmentation result are obtained;
s34: matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in a core word stock to obtain an end word matching result;
s35: matching the unique word algorithm and the accurate pattern word segmentation result in a core word stock to obtain a unique word matching result;
s36: and matching the word segmentation results in the core word stock by using a word segmentation algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results for abbreviations.
Specifically, the user may also input commodity code abbreviation, for example, soft drink of 500ml of farmer mountain spring purified water, wherein "soft drink" is commodity code abbreviation and "farmer mountain spring purified water 500ml" is original cargo name.
Specifically, the word segmentation algorithm is used for judging whether the word segmentation result of the accurate mode is empty or not, and if the word segmentation result is empty, the output result is empty. Otherwise, judging whether commodity coding abbreviations are in a preset core word stock, if so, outputting a result to be blank. If the core word sub-library is not empty, the corresponding core word sub-library is found according to the commodity code abbreviation, the number of the core word development companies of the full-mode word segmentation result in the commodity code abbreviation core word sub-library is matched, if the core word development companies are matched, the core word with the largest duty ratio is selected as an output result, otherwise, the output result is empty.
For example, if the commodity code is found in the core word stock and is simply called "soft drink", the full-mode word segmentation result is matched with the commodity name under the commodity code, and then the matching result with the largest number of open companies is selected.
S37: calculating the confidence coefficient of each matching result by using the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word stock;
s38: and outputting the matching result with the highest confidence.
Correspondingly, the embodiment of the application discloses a specific invoice commodity code assignment method, and compared with the previous embodiment, the technical scheme is further described and optimized. See fig. 4 for details:
a cargo name receiving module 11 for receiving a cargo name;
the barker word segmentation module 12 is used for utilizing the barker word segmentation and a preset core word stock to segment the names of goods to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module 13 is configured to match the core word library with a composite core word extraction algorithm, a full-mode word segmentation result and an accurate-mode word segmentation result to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity code company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence calculating module 14 is configured to calculate the confidence of each matching result by using a preset weighting duty ratio and the number duty ratio of goods and goods code company in each matching result recorded in the core word stock;
the result output module 15 is configured to output a matching result with the highest confidence;
the core word library is a database which is created in advance and comprises various goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
Therefore, the embodiment of the application extracts the core words by utilizing a plurality of composite algorithms, performs matching, improves the hit rate of the matching, obtains a plurality of matching results, finally selects the matching result with the highest confidence from the matching results by utilizing the confidence, and ensures the accuracy of the final result.
Specifically, the cargo name receiving module 11 may include an original name receiving unit and an original name cleaning unit; wherein,
the original name receiving unit is used for receiving the original goods name;
and the original name cleaning unit is used for cleaning the original cargo name, and removing the useless words by utilizing a preset useless word bank to obtain the cargo name.
Specifically, the core word extraction module 13 may include an ending word calculation unit and a unique word calculation unit; wherein the method comprises the steps of
The ending word calculation unit is used for matching in the core word stock by utilizing an ending word algorithm, a full-mode word segmentation result and/or an accurate mode word segmentation result to obtain an ending word matching result;
and the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
Specifically, the method also comprises a code short receiving module; wherein,
the code abbreviation receiving module is used for receiving commodity code abbreviations;
the core word extraction module 13 may include an end word calculation unit, a unique word calculation unit, and; wherein,
the ending word calculation unit is used for matching in the core word stock by utilizing an ending word algorithm, a full-mode word segmentation result and/or an accurate mode word segmentation result to obtain an ending word matching result;
the unique word calculation unit is used for matching in the core word stock by utilizing a unique word algorithm and an accurate mode word segmentation result to obtain a unique word matching result;
the word computing unit is used for matching the word computing unit in the core word stock by utilizing a word computing algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results for abbreviations.
In addition, the embodiment of the application also discloses an invoice commodity code assignment method, which comprises the following steps:
a memory for storing a computer program;
a processor for executing a computer program to effect invoice commodity code assignment as previously described.
In addition, the embodiment of the application also discloses a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the invoice commodity code assignment when being executed by a processor.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (6)
1. An invoice commodity code assignment method, which is characterized by comprising the following steps:
receiving a cargo name;
performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock;
outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of goods codes of the goods; when the core word library is created, cleaning the names of goods input into the core word library to remove stop words, removing the names of goods with the quantity of goods issuing companies lower than a certain threshold value to reduce the quantity of core words, and selecting the names of goods with the quantity of goods issuing companies accounting for more than 0.1%;
the process of receiving a cargo name includes:
receiving an original cargo name;
cleaning the original cargo name, and removing useless words by using a preset useless word bank to obtain the cargo name;
the process of matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a composite core word extraction algorithm to obtain a plurality of matching results comprises the following steps:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
and matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
2. The invoice commodity code assignment method according to claim 1, further comprising:
receiving commodity coding abbreviations;
the process of matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a composite core word extraction algorithm to obtain a plurality of matching results comprises the following steps:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
and matching the word segmentation result in the full mode and/or the accurate mode in the core word stock by using a word segmentation algorithm for short, the commodity code for short and/or the accurate mode word segmentation result to obtain a word matching result for short.
3. An invoice commodity code assignment system, comprising:
the goods name receiving module is used for receiving the goods name;
the bargain word segmentation module is used for utilizing the bargain word segmentation and a preset core word stock to segment the goods name to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module is used for matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by utilizing a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence coefficient calculating module is used for calculating the confidence coefficient of each matching result by utilizing the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word library;
the result output module is used for outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of goods codes of the goods; when the core word library is created, cleaning the names of goods input into the core word library to remove stop words, removing the names of goods with the quantity of goods issuing companies lower than a certain threshold value to reduce the quantity of core words, and selecting the names of goods with the quantity of goods issuing companies accounting for more than 0.1%;
the cargo name receiving module includes:
the original name receiving unit is used for receiving the original goods name;
the original name cleaning unit is used for cleaning the original goods name, and removing useless words by using a preset useless word bank to obtain the goods name;
the core word extraction module comprises:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
and the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
4. The invoice commodity code assignment system as claimed in claim 3, further comprising:
the code abbreviation receiving module is used for receiving commodity code abbreviations;
the core word extraction module comprises:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
the word computing unit is used for matching the word computing unit in the core word stock by utilizing a word computing algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results.
5. An invoice commodity code assigning device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the invoice commodity code assigning method according to claim 1 or 2.
6. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the invoice commodity code assigning method according to claim 1 or 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011346801.5A CN112348604B (en) | 2020-11-26 | 2020-11-26 | Invoice commodity code assignment method, system, device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011346801.5A CN112348604B (en) | 2020-11-26 | 2020-11-26 | Invoice commodity code assignment method, system, device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112348604A CN112348604A (en) | 2021-02-09 |
CN112348604B true CN112348604B (en) | 2023-11-17 |
Family
ID=74365936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011346801.5A Active CN112348604B (en) | 2020-11-26 | 2020-11-26 | Invoice commodity code assignment method, system, device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112348604B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101276360A (en) * | 2007-03-30 | 2008-10-01 | 建准电机工业股份有限公司 | Reliability verification method of patent retrieval data |
CN106844651A (en) * | 2017-01-20 | 2017-06-13 | 上海傲硕信息科技有限公司 | Instruction results compare screening plant |
CN107871144A (en) * | 2017-11-24 | 2018-04-03 | 税友软件集团股份有限公司 | Invoice trade name sorting technique, system, equipment and computer-readable recording medium |
CN108241677A (en) * | 2016-12-26 | 2018-07-03 | 航天信息股份有限公司 | A kind of method and system for the tax revenue sorting code number for obtaining commodity |
CN109213866A (en) * | 2018-09-19 | 2019-01-15 | 浙江诺诺网络科技有限公司 | A kind of tax commodity code classification method and system based on deep learning |
CN109918480A (en) * | 2019-03-01 | 2019-06-21 | 陈包容 | A method of address is extracted from text |
CN110347801A (en) * | 2019-07-17 | 2019-10-18 | 安徽航天信息有限公司 | A kind of commodity classification codes match method and system |
CN110597995A (en) * | 2019-09-20 | 2019-12-20 | 税友软件集团股份有限公司 | Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium |
CN110688851A (en) * | 2019-09-26 | 2020-01-14 | 税友软件集团股份有限公司 | Method, device and medium for extracting key information of address text |
CN110852815A (en) * | 2018-07-25 | 2020-02-28 | 阿里巴巴集团控股有限公司 | Data processing method, device and machine readable medium |
CN111368539A (en) * | 2020-03-02 | 2020-07-03 | 贵州电网有限责任公司 | Hotspot analysis modeling method |
CN111832318A (en) * | 2020-07-16 | 2020-10-27 | 平安科技(深圳)有限公司 | Single sentence natural language processing method and device, computer equipment and readable storage medium |
CN111985211A (en) * | 2020-09-01 | 2020-11-24 | 中国民航科学技术研究院 | Ontology concept obtaining method and device in civil aviation safety field and storage medium |
CN113191146A (en) * | 2021-05-26 | 2021-07-30 | 平安国际智慧城市科技股份有限公司 | Appeal data distribution method and device, computer equipment and storage medium |
-
2020
- 2020-11-26 CN CN202011346801.5A patent/CN112348604B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101276360A (en) * | 2007-03-30 | 2008-10-01 | 建准电机工业股份有限公司 | Reliability verification method of patent retrieval data |
CN108241677A (en) * | 2016-12-26 | 2018-07-03 | 航天信息股份有限公司 | A kind of method and system for the tax revenue sorting code number for obtaining commodity |
CN106844651A (en) * | 2017-01-20 | 2017-06-13 | 上海傲硕信息科技有限公司 | Instruction results compare screening plant |
CN107871144A (en) * | 2017-11-24 | 2018-04-03 | 税友软件集团股份有限公司 | Invoice trade name sorting technique, system, equipment and computer-readable recording medium |
CN110852815A (en) * | 2018-07-25 | 2020-02-28 | 阿里巴巴集团控股有限公司 | Data processing method, device and machine readable medium |
CN109213866A (en) * | 2018-09-19 | 2019-01-15 | 浙江诺诺网络科技有限公司 | A kind of tax commodity code classification method and system based on deep learning |
CN109918480A (en) * | 2019-03-01 | 2019-06-21 | 陈包容 | A method of address is extracted from text |
CN110347801A (en) * | 2019-07-17 | 2019-10-18 | 安徽航天信息有限公司 | A kind of commodity classification codes match method and system |
CN110597995A (en) * | 2019-09-20 | 2019-12-20 | 税友软件集团股份有限公司 | Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium |
CN110688851A (en) * | 2019-09-26 | 2020-01-14 | 税友软件集团股份有限公司 | Method, device and medium for extracting key information of address text |
CN111368539A (en) * | 2020-03-02 | 2020-07-03 | 贵州电网有限责任公司 | Hotspot analysis modeling method |
CN111832318A (en) * | 2020-07-16 | 2020-10-27 | 平安科技(深圳)有限公司 | Single sentence natural language processing method and device, computer equipment and readable storage medium |
CN111985211A (en) * | 2020-09-01 | 2020-11-24 | 中国民航科学技术研究院 | Ontology concept obtaining method and device in civil aviation safety field and storage medium |
CN113191146A (en) * | 2021-05-26 | 2021-07-30 | 平安国际智慧城市科技股份有限公司 | Appeal data distribution method and device, computer equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
在线商品评论有用性影响因素研究:基于文本语义视角;陈江涛;张金隆;张亚军;;图书情报工作(10);121-125 * |
基于深度特征语义学习模型的垃圾短信文本聚类研究;张毓;陈军清;;现代计算机(专业版)(07);17-21 * |
Also Published As
Publication number | Publication date |
---|---|
CN112348604A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110580335B (en) | User intention determining method and device | |
CN109255564B (en) | Pick-up point address recommendation method and device | |
CN109816134B (en) | Method and device for predicting delivery address and storage medium | |
CN109033299A (en) | It is a kind of by terminal applies to the method, device and equipment of user's recommendation information | |
CN110597995B (en) | Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium | |
CN109598517B (en) | Commodity clearance processing, object processing and category prediction method and device thereof | |
CN102193939A (en) | Realization method of information navigation, information navigation server and information processing system | |
CN101248435A (en) | Determination of a desired repository | |
CN110991464A (en) | Commodity click rate prediction method based on deep multi-mode data fusion | |
CN111782793A (en) | Intelligent customer service processing method, system and equipment | |
CN110705225A (en) | Contract marking method and device | |
CN111428486B (en) | Article information data processing method, device, medium and electronic equipment | |
CN104077288B (en) | Web page contents recommend method and web page contents recommendation apparatus | |
CN112348604B (en) | Invoice commodity code assignment method, system, device and readable storage medium | |
CN112182126A (en) | Model training method and device for determining matching degree, electronic equipment and readable storage medium | |
CN113139558A (en) | Method and apparatus for determining a multi-level classification label for an article | |
CN110502755A (en) | Character string identification method and computer storage medium based on Fusion Model | |
CN110852833A (en) | Taxi booking order processing method and device | |
US11776011B2 (en) | Methods and apparatus for improving the selection of advertising | |
CN111159397B (en) | Text classification method and device and server | |
CN111177391B (en) | Method and device for acquiring social public opinion volume and computer readable storage medium | |
CN110533284B (en) | Method and device for arranging pickup vehicle based on predicted commodity specification | |
CN115618871A (en) | Merchant text identification method, device, equipment and storage medium | |
CN112507066A (en) | Label marking method and device, electronic equipment and readable storage medium | |
CN111159398B (en) | Method and device for identifying merchant types |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |