CN112348604B - Invoice commodity code assignment method, system, device and readable storage medium - Google Patents

Invoice commodity code assignment method, system, device and readable storage medium Download PDF

Info

Publication number
CN112348604B
CN112348604B CN202011346801.5A CN202011346801A CN112348604B CN 112348604 B CN112348604 B CN 112348604B CN 202011346801 A CN202011346801 A CN 202011346801A CN 112348604 B CN112348604 B CN 112348604B
Authority
CN
China
Prior art keywords
word
matching
goods
core
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011346801.5A
Other languages
Chinese (zh)
Other versions
CN112348604A (en
Inventor
陈鹏飞
张镇潮
施建生
涂昶
钱力扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Servyou Software Group Co ltd
Original Assignee
Servyou Software Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Servyou Software Group Co ltd filed Critical Servyou Software Group Co ltd
Priority to CN202011346801.5A priority Critical patent/CN112348604B/en
Publication of CN112348604A publication Critical patent/CN112348604A/en
Application granted granted Critical
Publication of CN112348604B publication Critical patent/CN112348604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an invoice commodity code assignment method, a system, a device and a computer readable storage medium, comprising the following steps: receiving a cargo name; performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result; matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock; and outputting the matching result with the highest confidence. According to the application, core words are extracted by utilizing a plurality of composite algorithms, the hit rate of matching is improved, a plurality of matching results are obtained, and finally, the matching result with the highest confidence is selected from the plurality of matching results by utilizing the confidence, so that the accuracy of the final result is ensured.

Description

Invoice commodity code assignment method, system, device and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, a system, an apparatus, and a computer readable storage medium for assigning invoice commodity codes.
Background
When an enterprise issues an invoice, goods and services can be classified into 4000 categories according to a tax classification coding table of the national tax administration. Users who are not familiar with tax classification code tables usually fill in according to experience, often have the condition that commodity codes are wrongly filled in, and once the mistakes occur, unnecessary losses are likely to be caused. It is therefore necessary to devise an algorithm that can classify the names of the goods filled by the user into the most suitable goods codes through a series of calculations.
The algorithm in the prior art requires the user to accurately input the commodity name, so that the corresponding commodity code can be found in a pre-constructed commodity name library, but some can be found in the library and some can be hardly found because different invoicers have different invoicing habits. For example, "farmer spring mineral water" is prescribed by some enterprises, but some enterprises may prescribe "farmer spring mineral water 500ml", "1.5L farmer spring mineral water", etc., and the former "farmer spring mineral water" may be found but the latter two kinds may not be found by using the commodity warehouse
Therefore, a more flexible and efficient invoice commodity code assignment method is needed.
Disclosure of Invention
Accordingly, the present application is directed to a method, system, apparatus and computer readable storage medium for assigning invoice commodity codes, which is more flexible and efficient. The specific scheme is as follows:
an invoice commodity code assignment method, comprising:
receiving a cargo name;
performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock;
outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
Optionally, the process of receiving the cargo name includes:
receiving an original cargo name;
and cleaning the original cargo name, and removing the useless words by using a preset useless word bank to obtain the cargo name.
Optionally, the process of matching the compound core word extraction algorithm, the full-mode word segmentation result and the accurate-mode word segmentation result in the core word bank to obtain a plurality of matching results includes:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
and matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
Optionally, the method further comprises:
receiving commodity coding abbreviations;
the process of matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a composite core word extraction algorithm to obtain a plurality of matching results comprises the following steps:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
and matching the word segmentation result in the full mode and/or the accurate mode in the core word stock by using a word segmentation algorithm for short, the commodity code for short and/or the accurate mode word segmentation result to obtain a word matching result for short.
The application also discloses an invoice commodity code assignment system, which comprises:
the goods name receiving module is used for receiving the goods name;
the bargain word segmentation module is used for utilizing the bargain word segmentation and a preset core word stock to segment the goods name to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module is used for matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by utilizing a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence coefficient calculating module is used for calculating the confidence coefficient of each matching result by utilizing the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word library;
the result output module is used for outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
Optionally, the cargo name receiving module includes:
the original name receiving unit is used for receiving the original goods name;
and the original name cleaning unit is used for cleaning the original cargo name, and removing the useless words by utilizing a preset useless word bank to obtain the cargo name.
Optionally, the core word extraction module includes:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
and the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
Optionally, the method further comprises:
the code abbreviation receiving module is used for receiving commodity code abbreviations;
the core word extraction module comprises:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
the word computing unit is used for matching the word computing unit in the core word stock by utilizing a word computing algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results.
The application also discloses an invoice commodity code assignment method, which comprises the following steps:
a memory for storing a computer program;
and a processor for executing the computer program to implement the invoice commodity code assignment as described above.
The application also discloses a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the invoice commodity code assignment when being executed by a processor.
The application discloses an invoice commodity code assignment method, which comprises the following steps: receiving a cargo name; performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result; matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms; calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock; outputting a matching result with highest confidence coefficient; the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
According to the application, core words are extracted by utilizing a plurality of composite algorithms, the hit rate of matching is improved, a plurality of matching results are obtained, and finally, the matching result with the highest confidence is selected from the plurality of matching results by utilizing the confidence, so that the accuracy of the final result is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a storage method for a dock container mirror image disclosed in an embodiment of the present application;
FIG. 2 is a schematic flow chart of another method for storing a dock container image according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a mirror image directional pulling method of a dock container disclosed in an embodiment of the present application;
fig. 4 is a schematic diagram of a mirror image directional pulling flow of a docker container according to another embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application discloses an invoice commodity code assignment method, which is shown in fig. 1 and comprises the following steps:
s11: receiving a cargo name;
s12: and performing word segmentation on the cargo name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result.
Specifically, after receiving a cargo name input by a user, performing word segmentation on the cargo name by using a resultant word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result, for example, the full-mode word segmentation result and the accurate-mode word segmentation result are respectively marked as a cut_all_result and a cut_result, the cargo name input by the user is assumed to be "pure water of a farmer mountain spring", three core words of the "pure water of the farmer mountain spring", "the farmer mountain spring" and "pure water" are recorded in the core word stock, and therefore, three results, namely "the farmer mountain spring", "pure water" and "pure water", can be obtained by performing word segmentation on the cargo name "the farmer mountain spring pure water" which is completely the same as the cargo name, the "farmer mountain spring" and the "pure water" are marked as partial cargo names are marked as the full-mode word segmentation result, and the cut_all_result is obtained: the results of [ 'farmer mountain spring', 'purified water', 'pure water', 'and cut_result [' farmer mountain spring purified water ','.
Specifically, if the bargain word segmentation cannot find the word segmentation result of the cargo name in the core word stock, the reason may be that the cargo name input by the user is wrong or the information of the related cargo name is not recorded in the core word stock, the finally output full-mode word segmentation result and the accurate-mode word segmentation result are both empty, and the subsequent matching process can be terminated.
It can be understood that if the information of the related cargo name is not recorded in the core word stock, the information can be added later according to the actual application requirement.
S13: and matching the compound core word extraction algorithm, the full-mode word segmentation result and the accurate-mode word segmentation result in a core word stock to obtain a plurality of matching results.
Specifically, the compound core word extraction algorithm includes a plurality of core word extraction algorithms, for example, an ending word algorithm, a unique word algorithm, a word simplified algorithm and the like, the core word extraction algorithm is utilized to continuously match core words in a core word stock based on a full-mode word segmentation result and an accurate-mode word segmentation result obtained by the previous ending word algorithm, so that the matching result output by the core word extraction algorithm can be obtained, each core word extraction algorithm can output a plurality of matching results or only one matching result, and of course, based on the accuracy of a goods name input by a user, part of core word extraction algorithms can not output an effective matching result, namely, the output matching result is null, but other core word extraction algorithms are not influenced, and the finally output matching result is not influenced.
It can be understood that if all the core word extraction algorithms output a null, the final matching result is null, and the reason may be that the goods name input by the user is wrong or the information of the related goods name is not recorded in the core word stock. If the information of the related goods names is not recorded in the core word stock, the information can be added later according to the actual application requirements.
S14: and calculating the confidence coefficient of each matching result by using the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word stock.
Specifically, because the composite core word extraction algorithm is adopted, a plurality of matching results are obtained, and in order to output unique and accurate commodity codes corresponding to the names of cargoes, corresponding weighting duty ratios are preset for each core word extraction algorithm in the composite core word extraction algorithm, so that each matching result corresponds to the corresponding weighting duty ratio.
Finally, the confidence coefficient of each matching result can be calculated by using the number proportion of goods and commodity code opening companies in each matching result and the preset weighting proportion, for example, the matching results of three algorithms are { 'purified water-1030307040000000000': 90}, { 'farmer mountain spring purified water-1030307040000000000': 60} and { 'purified water-1030307040000000000': 90}, wherein the text part such as "purified water" is a goods name, the number part such as "1030307040000000000" is a commodity code, the number such as "90" is the number proportion of goods and commodity code opening companies, and the calculation result by using the weighting proportion can be: { 'pure water-1030307040000000000': 90}, 0.2+ { 'pure water of farmer mountain spring-1030307040000000000': 60}, 0.3+ { 'pure water-1030307040000000000': 90}, 0.5 = { 'pure water-1030307040000000000': 63, 'pure water of farmer mountain spring-1030307040000000000': 18}, wherein numbers such as "0.2" and "0.5" are the weighted duty ratios of each algorithm, in the above example, the confidence of the product name of "pure water" is calculated, the product name of "1030307040000000000" is 63, the product name of "pure water of farmer mountain spring" is calculated, and the confidence of the product name of "1030307040000000000" is 18.
It should be noted that, the corresponding relation between the goods name and the goods code is pre-constructed in the core word stock, so after the matching result is obtained in the core word stock, the corresponding goods code and the number ratio of the goods code issuing companies can be obtained, see the core word stock shown in table 1.
TABLE 1
S15: and outputting the matching result with the highest confidence.
Specifically, after the confidence coefficient is calculated, a matching result with the highest confidence coefficient can be output, and the commodity code corresponding to the commodity name initially input by the user can be obtained.
Therefore, the embodiment of the application extracts the core words by utilizing a plurality of composite algorithms, performs matching, improves the hit rate of the matching, obtains a plurality of matching results, finally selects the matching result with the highest confidence from the matching results by utilizing the confidence, and ensures the accuracy of the final result.
Specifically, when the core word stock is created, the names of goods input into the core word stock are cleaned, stop words are removed, the accuracy of the names of the goods is ensured, interference information is reduced, efficiency of a subsequent core extraction algorithm in extracting the core words is improved, meanwhile, the names of the goods with the number of goods and goods issuing companies lower than a certain threshold value can be removed, the quantity of the core words is reduced, so that the subsequent extraction speed is improved, for example, data with the number of the issuing companies lower than 5 can be removed, for example, "hydrogen peroxide" in table 1, the number of the issuing companies is only 1, the hydrogen peroxide can be removed, and in addition, the goods and goods with the number of the goods and goods encoding issuing companies accounting for more than 0.1% can be selected, so that the hydrogen peroxide in table 1 is low in number of issuing companies, but the number of the goods and goods encoding issuing companies meets the requirements because of the total quantity of the issuing companies is low, and can be still stored in the core word stock.
The embodiment of the application discloses a specific invoice commodity code assignment method, and compared with the previous embodiment, the technical scheme of the embodiment is further described and optimized. See fig. 2 for details:
s21: receiving an original cargo name;
s22: and cleaning the original cargo name, and removing the useless words by using a preset useless word bank to obtain the cargo name.
Specifically, because the original goods names input by the user have the problem of inaccuracy, the original goods names input by the user can be cleaned, and the unnecessary words are removed from the original goods names through a preset unnecessary word bank and a corresponding cleaning algorithm, so that the goods names are obtained.
For example, the original goods are named as 500ml of special-price farmer mountain spring purified water, and the goods obtained after washing are named as farmer mountain spring purified water, so that two unnecessary words, namely special price and 500ml, are removed, and the subsequent word segmentation precision and the subsequent matching efficiency are improved.
S23: the goods names are segmented by using the bargain segmentation and a preset core word stock, and a full-mode segmentation result and an accurate-mode segmentation result are obtained;
s24: and matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in a core word stock to obtain an end word matching result.
Specifically, the compound core word extraction algorithm may include an end word algorithm, where the end word algorithm first determines whether the cargo name ends with some words in the full-mode word segmentation result, if so, outputs the word serving as the end word in the full-mode word segmentation result as an end word matching result, for example, the cargo name is "farmer mountain spring pure water", the full-mode word segmentation result is "farmer mountain spring" and "pure water", the "pure water" is the end word, and if not, determines whether the number of words in the precise-mode word segmentation result is greater than 1, if so, continues to determine whether the last word in the precise-mode word segmentation result ends with some words in the full-mode word segmentation result, if so, outputs the word serving as the end word matching result, and if not, the end word matching result is empty.
S25: and matching the unique word algorithm with the accurate pattern word segmentation result in a core word stock to obtain a unique word matching result.
Specifically, a unique word algorithm is utilized to judge whether the word in the accurate mode word segmentation result is unique, if so, the word is used as a unique word matching result, and if not, the output result is null.
S26: calculating the confidence coefficient of each matching result by using the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word stock;
s27: and outputting the matching result with the highest confidence.
Further, the embodiment of the application also discloses an invoice commodity code assignment method, which is shown in fig. 3 and comprises the following steps:
s31: receiving an original goods name and commodity coding abbreviation;
s32: cleaning the original cargo name, and removing useless words by using a preset useless word bank to obtain the cargo name;
s33: the goods names are segmented by using the bargain segmentation and a preset core word stock, and a full-mode segmentation result and an accurate-mode segmentation result are obtained;
s34: matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in a core word stock to obtain an end word matching result;
s35: matching the unique word algorithm and the accurate pattern word segmentation result in a core word stock to obtain a unique word matching result;
s36: and matching the word segmentation results in the core word stock by using a word segmentation algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results for abbreviations.
Specifically, the user may also input commodity code abbreviation, for example, soft drink of 500ml of farmer mountain spring purified water, wherein "soft drink" is commodity code abbreviation and "farmer mountain spring purified water 500ml" is original cargo name.
Specifically, the word segmentation algorithm is used for judging whether the word segmentation result of the accurate mode is empty or not, and if the word segmentation result is empty, the output result is empty. Otherwise, judging whether commodity coding abbreviations are in a preset core word stock, if so, outputting a result to be blank. If the core word sub-library is not empty, the corresponding core word sub-library is found according to the commodity code abbreviation, the number of the core word development companies of the full-mode word segmentation result in the commodity code abbreviation core word sub-library is matched, if the core word development companies are matched, the core word with the largest duty ratio is selected as an output result, otherwise, the output result is empty.
For example, if the commodity code is found in the core word stock and is simply called "soft drink", the full-mode word segmentation result is matched with the commodity name under the commodity code, and then the matching result with the largest number of open companies is selected.
S37: calculating the confidence coefficient of each matching result by using the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word stock;
s38: and outputting the matching result with the highest confidence.
Correspondingly, the embodiment of the application discloses a specific invoice commodity code assignment method, and compared with the previous embodiment, the technical scheme is further described and optimized. See fig. 4 for details:
a cargo name receiving module 11 for receiving a cargo name;
the barker word segmentation module 12 is used for utilizing the barker word segmentation and a preset core word stock to segment the names of goods to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module 13 is configured to match the core word library with a composite core word extraction algorithm, a full-mode word segmentation result and an accurate-mode word segmentation result to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity code company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence calculating module 14 is configured to calculate the confidence of each matching result by using a preset weighting duty ratio and the number duty ratio of goods and goods code company in each matching result recorded in the core word stock;
the result output module 15 is configured to output a matching result with the highest confidence;
the core word library is a database which is created in advance and comprises various goods names, goods codes of the goods and the quantity proportion of the goods codes of the goods.
Therefore, the embodiment of the application extracts the core words by utilizing a plurality of composite algorithms, performs matching, improves the hit rate of the matching, obtains a plurality of matching results, finally selects the matching result with the highest confidence from the matching results by utilizing the confidence, and ensures the accuracy of the final result.
Specifically, the cargo name receiving module 11 may include an original name receiving unit and an original name cleaning unit; wherein,
the original name receiving unit is used for receiving the original goods name;
and the original name cleaning unit is used for cleaning the original cargo name, and removing the useless words by utilizing a preset useless word bank to obtain the cargo name.
Specifically, the core word extraction module 13 may include an ending word calculation unit and a unique word calculation unit; wherein the method comprises the steps of
The ending word calculation unit is used for matching in the core word stock by utilizing an ending word algorithm, a full-mode word segmentation result and/or an accurate mode word segmentation result to obtain an ending word matching result;
and the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
Specifically, the method also comprises a code short receiving module; wherein,
the code abbreviation receiving module is used for receiving commodity code abbreviations;
the core word extraction module 13 may include an end word calculation unit, a unique word calculation unit, and; wherein,
the ending word calculation unit is used for matching in the core word stock by utilizing an ending word algorithm, a full-mode word segmentation result and/or an accurate mode word segmentation result to obtain an ending word matching result;
the unique word calculation unit is used for matching in the core word stock by utilizing a unique word algorithm and an accurate mode word segmentation result to obtain a unique word matching result;
the word computing unit is used for matching the word computing unit in the core word stock by utilizing a word computing algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results for abbreviations.
In addition, the embodiment of the application also discloses an invoice commodity code assignment method, which comprises the following steps:
a memory for storing a computer program;
a processor for executing a computer program to effect invoice commodity code assignment as previously described.
In addition, the embodiment of the application also discloses a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the invoice commodity code assignment when being executed by a processor.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (6)

1. An invoice commodity code assignment method, which is characterized by comprising the following steps:
receiving a cargo name;
performing word segmentation on the goods name by using the bargain word segmentation and a preset core word stock to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a compound core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
calculating the confidence coefficient of each matching result by using a preset weighting duty ratio and the quantity duty ratio of goods and commodity coding company in each matching result recorded in the core word stock;
outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of goods codes of the goods; when the core word library is created, cleaning the names of goods input into the core word library to remove stop words, removing the names of goods with the quantity of goods issuing companies lower than a certain threshold value to reduce the quantity of core words, and selecting the names of goods with the quantity of goods issuing companies accounting for more than 0.1%;
the process of receiving a cargo name includes:
receiving an original cargo name;
cleaning the original cargo name, and removing useless words by using a preset useless word bank to obtain the cargo name;
the process of matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a composite core word extraction algorithm to obtain a plurality of matching results comprises the following steps:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
and matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
2. The invoice commodity code assignment method according to claim 1, further comprising:
receiving commodity coding abbreviations;
the process of matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by using a composite core word extraction algorithm to obtain a plurality of matching results comprises the following steps:
matching the end word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an end word matching result;
matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
and matching the word segmentation result in the full mode and/or the accurate mode in the core word stock by using a word segmentation algorithm for short, the commodity code for short and/or the accurate mode word segmentation result to obtain a word matching result for short.
3. An invoice commodity code assignment system, comprising:
the goods name receiving module is used for receiving the goods name;
the bargain word segmentation module is used for utilizing the bargain word segmentation and a preset core word stock to segment the goods name to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module is used for matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word stock by utilizing a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the commodity names, commodity codes and the quantity proportion of the commodity codes and the company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence coefficient calculating module is used for calculating the confidence coefficient of each matching result by utilizing the preset weighting duty ratio and the cargo commodity code issuing company quantity duty ratio in each matching result recorded in the core word library;
the result output module is used for outputting a matching result with highest confidence coefficient;
the core word library is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the quantity proportion of goods codes of the goods; when the core word library is created, cleaning the names of goods input into the core word library to remove stop words, removing the names of goods with the quantity of goods issuing companies lower than a certain threshold value to reduce the quantity of core words, and selecting the names of goods with the quantity of goods issuing companies accounting for more than 0.1%;
the cargo name receiving module includes:
the original name receiving unit is used for receiving the original goods name;
the original name cleaning unit is used for cleaning the original goods name, and removing useless words by using a preset useless word bank to obtain the goods name;
the core word extraction module comprises:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
and the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result.
4. The invoice commodity code assignment system as claimed in claim 3, further comprising:
the code abbreviation receiving module is used for receiving commodity code abbreviations;
the core word extraction module comprises:
the ending word calculation unit is used for matching the ending word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock to obtain an ending word matching result;
the unique word calculation unit is used for matching the unique word algorithm with the accurate pattern word segmentation result in the core word stock to obtain a unique word matching result;
the word computing unit is used for matching the word computing unit in the core word stock by utilizing a word computing algorithm, commodity coding abbreviations, full-mode word segmentation results and/or accurate-mode word segmentation results to obtain word matching results.
5. An invoice commodity code assigning device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the invoice commodity code assigning method according to claim 1 or 2.
6. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the invoice commodity code assigning method according to claim 1 or 2.
CN202011346801.5A 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium Active CN112348604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011346801.5A CN112348604B (en) 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011346801.5A CN112348604B (en) 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium

Publications (2)

Publication Number Publication Date
CN112348604A CN112348604A (en) 2021-02-09
CN112348604B true CN112348604B (en) 2023-11-17

Family

ID=74365936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011346801.5A Active CN112348604B (en) 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN112348604B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276360A (en) * 2007-03-30 2008-10-01 建准电机工业股份有限公司 Reliability verification method of patent retrieval data
CN106844651A (en) * 2017-01-20 2017-06-13 上海傲硕信息科技有限公司 Instruction results compare screening plant
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN109213866A (en) * 2018-09-19 2019-01-15 浙江诺诺网络科技有限公司 A kind of tax commodity code classification method and system based on deep learning
CN109918480A (en) * 2019-03-01 2019-06-21 陈包容 A method of address is extracted from text
CN110347801A (en) * 2019-07-17 2019-10-18 安徽航天信息有限公司 A kind of commodity classification codes match method and system
CN110597995A (en) * 2019-09-20 2019-12-20 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110688851A (en) * 2019-09-26 2020-01-14 税友软件集团股份有限公司 Method, device and medium for extracting key information of address text
CN110852815A (en) * 2018-07-25 2020-02-28 阿里巴巴集团控股有限公司 Data processing method, device and machine readable medium
CN111368539A (en) * 2020-03-02 2020-07-03 贵州电网有限责任公司 Hotspot analysis modeling method
CN111832318A (en) * 2020-07-16 2020-10-27 平安科技(深圳)有限公司 Single sentence natural language processing method and device, computer equipment and readable storage medium
CN111985211A (en) * 2020-09-01 2020-11-24 中国民航科学技术研究院 Ontology concept obtaining method and device in civil aviation safety field and storage medium
CN113191146A (en) * 2021-05-26 2021-07-30 平安国际智慧城市科技股份有限公司 Appeal data distribution method and device, computer equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276360A (en) * 2007-03-30 2008-10-01 建准电机工业股份有限公司 Reliability verification method of patent retrieval data
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN106844651A (en) * 2017-01-20 2017-06-13 上海傲硕信息科技有限公司 Instruction results compare screening plant
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN110852815A (en) * 2018-07-25 2020-02-28 阿里巴巴集团控股有限公司 Data processing method, device and machine readable medium
CN109213866A (en) * 2018-09-19 2019-01-15 浙江诺诺网络科技有限公司 A kind of tax commodity code classification method and system based on deep learning
CN109918480A (en) * 2019-03-01 2019-06-21 陈包容 A method of address is extracted from text
CN110347801A (en) * 2019-07-17 2019-10-18 安徽航天信息有限公司 A kind of commodity classification codes match method and system
CN110597995A (en) * 2019-09-20 2019-12-20 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110688851A (en) * 2019-09-26 2020-01-14 税友软件集团股份有限公司 Method, device and medium for extracting key information of address text
CN111368539A (en) * 2020-03-02 2020-07-03 贵州电网有限责任公司 Hotspot analysis modeling method
CN111832318A (en) * 2020-07-16 2020-10-27 平安科技(深圳)有限公司 Single sentence natural language processing method and device, computer equipment and readable storage medium
CN111985211A (en) * 2020-09-01 2020-11-24 中国民航科学技术研究院 Ontology concept obtaining method and device in civil aviation safety field and storage medium
CN113191146A (en) * 2021-05-26 2021-07-30 平安国际智慧城市科技股份有限公司 Appeal data distribution method and device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
在线商品评论有用性影响因素研究:基于文本语义视角;陈江涛;张金隆;张亚军;;图书情报工作(10);121-125 *
基于深度特征语义学习模型的垃圾短信文本聚类研究;张毓;陈军清;;现代计算机(专业版)(07);17-21 *

Also Published As

Publication number Publication date
CN112348604A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN110580335B (en) User intention determining method and device
CN109255564B (en) Pick-up point address recommendation method and device
CN109816134B (en) Method and device for predicting delivery address and storage medium
CN109033299A (en) It is a kind of by terminal applies to the method, device and equipment of user's recommendation information
CN110597995B (en) Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN102193939A (en) Realization method of information navigation, information navigation server and information processing system
CN101248435A (en) Determination of a desired repository
CN110991464A (en) Commodity click rate prediction method based on deep multi-mode data fusion
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN110705225A (en) Contract marking method and device
CN111428486B (en) Article information data processing method, device, medium and electronic equipment
CN104077288B (en) Web page contents recommend method and web page contents recommendation apparatus
CN112348604B (en) Invoice commodity code assignment method, system, device and readable storage medium
CN112182126A (en) Model training method and device for determining matching degree, electronic equipment and readable storage medium
CN113139558A (en) Method and apparatus for determining a multi-level classification label for an article
CN110502755A (en) Character string identification method and computer storage medium based on Fusion Model
CN110852833A (en) Taxi booking order processing method and device
US11776011B2 (en) Methods and apparatus for improving the selection of advertising
CN111159397B (en) Text classification method and device and server
CN111177391B (en) Method and device for acquiring social public opinion volume and computer readable storage medium
CN110533284B (en) Method and device for arranging pickup vehicle based on predicted commodity specification
CN115618871A (en) Merchant text identification method, device, equipment and storage medium
CN112507066A (en) Label marking method and device, electronic equipment and readable storage medium
CN111159398B (en) Method and device for identifying merchant types

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant