CN111192128A - Method for identifying abnormal tax payment behaviors - Google Patents

Method for identifying abnormal tax payment behaviors Download PDF

Info

Publication number
CN111192128A
CN111192128A CN201911397878.2A CN201911397878A CN111192128A CN 111192128 A CN111192128 A CN 111192128A CN 201911397878 A CN201911397878 A CN 201911397878A CN 111192128 A CN111192128 A CN 111192128A
Authority
CN
China
Prior art keywords
commodity
main
name
list
commodities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911397878.2A
Other languages
Chinese (zh)
Other versions
CN111192128B (en
Inventor
刘芬
王志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201911397878.2A priority Critical patent/CN111192128B/en
Publication of CN111192128A publication Critical patent/CN111192128A/en
Application granted granted Critical
Publication of CN111192128B publication Critical patent/CN111192128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the disclosure discloses a method for identifying abnormal tax payment behaviors, which comprises the following steps: acquiring a main sales commodity list based on the ratio of the summarized amount of each sales commodity to the summarized amounts of all sales commodities; acquiring a main purchased commodity list based on the ratio of the total amount of each input commodity to the total amount of all the input commodities; processing the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list based on a natural language processing technology to obtain a first processing result; and judging whether the tax payment behaviors are abnormal or not based on the first processing result. The purpose of improving the identification efficiency of abnormal tax payment behaviors is achieved.

Description

Method for identifying abnormal tax payment behaviors
Technical Field
The disclosure belongs to the technical field of information, and particularly relates to a method for identifying abnormal tax payment behaviors.
Background
Based on the detailed data of the value-added tax invoice goods, abnormal behaviors such as 'sale is not real, deduction is not coincident, invoice is wrongly issued' and the like are identified by analyzing the commodity of the input and sales item, and the method is an important means for tax risk prevention and control. However, the diversity, complexity, and irregular filling of the names of the goods make identification of the same goods entity and similar goods difficult. In addition, the great difference exists between the commodity of the sale item in the production or processing enterprises, so that whether the commodity is abnormal or not can not be directly judged by measuring the similarity degree of the commodity names of the sale item. The existing method calculates the difference degree of commodity sales items based on commodity codes or simple commodity name similarity, and then identifies abnormal tax payment behaviors. However, the many-to-many nature of commodity codes and commodity names and the simple calculation method of commodity name similarity often cause the problem that the analysis has poor accuracy and comprehensiveness. The existing abnormal tax payment behavior identification has the problems of low efficiency and accuracy.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide a method for identifying abnormal tax payment behaviors, which at least solves the problem in the prior art that the efficiency and accuracy of identifying abnormal tax payment behaviors are low.
In a first aspect, an embodiment of the present disclosure provides a method for identifying abnormal tax payment behaviors, including:
acquiring a main sales commodity list based on the ratio of the summarized amount of each sales commodity to the summarized amounts of all sales commodities;
acquiring a main purchased commodity list based on the ratio of the total amount of each input commodity to the total amount of all the input commodities;
processing the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list based on a natural language processing technology to obtain a first processing result;
and judging whether the tax payment behaviors are abnormal or not based on the first processing result.
Optionally, before or after the step of processing the name of the master sale commodity in the master sale commodity list and the name of the master purchase commodity in the master purchase commodity list based on the natural language processing technology to obtain a processing result, the method further includes:
and judging whether the name of the main sale commodity in the main sale commodity list and the name of the main purchase commodity in the main purchase commodity list are in a combined entity word library obtained through correlation analysis or not, and judging whether the tax payment behavior is abnormal or not.
Optionally, the processing the name of the master sale commodity in the master sale commodity list and the name of the master purchase commodity in the master purchase commodity list based on the natural language processing technology to obtain a first processing result includes:
performing word segmentation on the name of the main sale commodity and the name of the main purchase commodity, and extracting entity words;
acquiring a word vector of the extracted entity word by using the acquired word vector resource;
calculating cosine similarity between entity words based on the word vectors;
for each group of commodities, taking the maximum value of the cosine similarity of all the entity words as the commodity name similarity of the group of commodities, wherein each group of commodities comprises an item entering commodity and an item selling commodity;
combining the name of the main commodity in the main commodity list with the name of the main commodity in the main commodity list to form a plurality of groups of commodities, and calculating the similarity of the names of the commodities of each group;
selecting the largest commodity name similarity as the commodity similarity of the sale-in item;
and judging the similarity of the commodity of the sale-in item and a first set threshold value.
Optionally, in the calculating the cosine similarity between entity words based on the word vector, the calculation formula of the cosine similarity is as follows:
Figure BDA0002346787640000031
wherein
Figure BDA0002346787640000032
And
Figure BDA0002346787640000033
is a word vector of a solid word, and is a vector
Figure BDA0002346787640000034
Sum vector
Figure BDA0002346787640000035
The die of (1).
Optionally, if the commodity similarity of the sales item is greater than a first set threshold, the tax payment behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
Optionally, the association analysis in the combined entity vocabulary obtained by the association analysis includes:
performing word segmentation on the sale item commodity name and the entry item commodity name in the related industry, and extracting entity words;
counting the frequency of occurrence of each extracted entity word for all commodity sales;
counting the frequency of simultaneous occurrence of each pair of entity words, namely the frequency of combined entity words, for each group of item selling commodities and item entering commodities;
comparing the frequency of occurrence of each entity word with a second set threshold value to obtain a cancellation entity word;
and comparing the frequency numbers of the combined entity words containing the item canceling entity words with a third set threshold value to obtain a combined entity word bank.
Optionally, the determining whether the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list are in a combined entity library obtained through association analysis, and determining whether the tax payment behavior is abnormal includes:
if a combined entity word obtained based on the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list appears in a combined entity word library, the tax payment behavior is considered normal;
otherwise, the tax payment behavior is considered abnormal.
Optionally, after the steps of obtaining the list of the merchandize items and obtaining the list of the bought merchandize items, the method further includes:
judging whether the main commodity sale list or the main commodity purchase list is empty or not;
or/and
judging whether the sale item is normal or not based on the number of the commodities in the main sale commodity list or the main purchase commodity list and the recorded content;
or/and
acquiring an intersection of a main sales commodity set and a main purchase commodity set based on the main sales commodity list and the main purchase commodity list, and judging whether the sales items are normal based on the intersection;
or/and
judging whether the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list comprise the same phrase or not, thereby judging whether the sale-in item is normal or not;
or/and
judging whether the name of the main commodity and the name of the main commodity are character strings which can not be identified at all;
if the commodity name is not completely unidentifiable, removing unidentifiable character strings in the main sales commodity name and the main purchase commodity name, and reserving the identifiable character strings;
and taking the intersection of the character strings which can be identified in the name of the main sales commodity and the character strings which can be identified in the name of the main purchase commodity, obtaining the number m of intersection elements and the maximum length n of 2 character strings, and judging whether the sales item is normal or not based on the comparison of the m/n and a fourth set threshold value.
Optionally, after the steps of obtaining the list of the merchandize items and obtaining the list of the bought merchandize items, the method further includes:
judging whether the tax payment behavior is abnormal or not based on the set dictionary, comprising the following steps:
converting the character strings of the name of the main sale commodity and the name of the main purchase commodity into a character set;
solving intersection of the character set and a set single word dictionary, and calculating the number of elements of the intersection;
if the number of the elements is zero, traversing the words in the set multi-word dictionary one by one, checking whether the name of the main commodity and the name of the main commodity contain the words in the multi-word dictionary, and correcting the set single-word dictionary and the set multi-word dictionary according to the feedback result.
Optionally, obtaining a list of merchandize items or obtaining a list of bought items includes:
acquiring the sum of all sold commodities and all entered commodities;
acquiring the sum of each item selling commodity and each item entering commodity;
dividing the sum of each item of sale commodity by the sum of all the item of sale commodities to obtain a plurality of first proportional values; dividing the sum of each item commodity by the sum of all the item commodities to obtain a plurality of second proportional values;
accumulating the plurality of first proportional values in a descending order to obtain a first accumulation result, stopping accumulation when the first accumulation result is larger than a fifth set threshold, wherein a list of marketing commodities corresponding to the first accumulation result larger than the fifth set threshold is a master marketing commodity list;
and accumulating the plurality of second proportional values in the descending order to obtain a second accumulated result, stopping accumulation when the second accumulated result is larger than a sixth set threshold, wherein the list of the inlet commodities corresponding to the second accumulated result larger than the sixth set threshold is the main commodity list.
The method comprises the steps of firstly, acquiring a main sales commodity list based on the proportion of the sum of each sales commodity in the sum of all sales commodities; and acquiring a main purchased commodity list based on the ratio of the sum of each input commodity to the sum of all the input commodities. Then processing the name of the main sale commodity in the main sale commodity list and the name of the main purchase commodity in the main purchase commodity list based on a natural language processing technology to obtain a first processing result; and judging whether the tax payment behaviors are abnormal or not based on the first processing result. And determining a main sales commodity list and a main purchase commodity list of each enterprise, and not analyzing non-main income and sale item commodities. The efficiency of discernment is improved. With the help of NLP (natural language processing) technology and pre-trained public word vector resources, after the commodity name is subjected to word segmentation processing, the similarity degree of the commodity name is calculated through the word vector, and the semantic similarity of the commodity name is measured to make up for the defect that only the literal similarity of the commodity name is calculated; therefore, the accuracy of identification is improved, and the purpose of improving the identification efficiency of abnormal tax payment behaviors is achieved.
The method also utilizes correlation analysis, combines the frequency of the sales item commodities and the frequency information of the sales item commodities appearing at the same time, and sets a threshold value to identify abnormal tax payment behaviors so as to solve the problem of judging whether the sales items of the production and processing enterprises are abnormal or not. In addition, based on the characteristics of the detailed data of the goods, the customized dictionary and the customized rules analyze the commodities (such as chemicals and medicines) in the special field, and the efficiency and the accuracy of identifying the abnormal tax payment behaviors can be improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
FIG. 1 illustrates a flow diagram of a first method of identifying abnormal tax payment behavior according to one embodiment of the present disclosure;
FIG. 2 illustrates a flow diagram of a second method of identifying abnormal tax payment behavior according to one embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a third method of identifying abnormal tax payment behavior according to one embodiment of the present disclosure.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below. While the following describes preferred embodiments of the present disclosure, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein.
A method of identifying abnormal tax payment behavior, comprising:
step S101: acquiring a main sales commodity list based on the ratio of the summarized amount of each sales commodity to the summarized amounts of all sales commodities;
acquiring a main purchased commodity list based on the ratio of the total amount of each input commodity to the total amount of all the input commodities;
the acquisition of the list of the master commodity and the acquisition of the list of the master commodity are not in sequence and can be carried out simultaneously, or the list of the master commodity can be acquired firstly and then the list of the master commodity can be acquired.
Step S102: processing the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list based on a natural language processing technology to obtain a first processing result;
step S103: and judging whether the tax payment behaviors are abnormal or not based on the first processing result.
Optionally, the method further includes: step S207: and judging whether the name of the main sale commodity in the main sale commodity list and the name of the main purchase commodity in the main purchase commodity list are in a combined entity word library obtained through correlation analysis or not, and judging whether the tax payment behavior is abnormal or not.
Step S207 may be before or after step S102, and step S207 has no direct relationship with step S102. If it is determined that the enterprise tax payment is abnormal through the step S102 after the master sales product list and the master purchase product list are acquired, the step S207 may not be performed.
Optionally, the processing the name of the master sale commodity in the master sale commodity list and the name of the master purchase commodity in the master purchase commodity list based on the natural language processing technology to obtain a first processing result includes:
performing word segmentation on the name of the main sale commodity and the name of the main purchase commodity, and extracting entity words;
acquiring a word vector of the extracted entity word by using the acquired word vector resource;
calculating cosine similarity between entity words based on the word vectors;
for each group of commodities, taking the maximum value of the cosine similarity of all the entity words as the commodity name similarity of the group of commodities, wherein each group of commodities comprises an item entering commodity and an item selling commodity;
combining the name of the main commodity in the main commodity list with the name of the main commodity in the main commodity list to form a plurality of groups of commodities, and calculating the similarity of the names of the commodities of each group;
selecting the largest commodity name similarity as the commodity similarity of the sale-in item;
and judging the similarity of the commodity of the sale-in item and a first set threshold value.
Optionally, in the calculating the cosine similarity between entity words based on the word vector, the calculation formula of the cosine similarity is as follows:
Figure BDA0002346787640000091
wherein
Figure BDA0002346787640000092
And
Figure BDA0002346787640000093
is a word vector of a solid word, and is a vector
Figure BDA0002346787640000094
Sum vector
Figure BDA0002346787640000095
The die of (1).
Optionally, if the commodity similarity of the sales item is greater than a first set threshold, the tax payment behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
In a specific real-time scenario, if two names of commodities have no same characters or words but have similar semantics, such as "toilet bowl" and "toilet bowl", it is impossible to determine whether the sales item is abnormal by comparing whether the characters of the names of commodities are similar, and Natural Language Processing (NLP) technology is required to identify the names of commodities having similar semantics. The specific implementation method comprises the following steps:
and utilizing a word segmentation tool to segment the commodity names of the business items and the sales items and extracting entity words. For example, the word "lady trousers" is divided into "lady" and "trousers", which is the entity word of the commodity name;
utilizing public Chinese word vector resources trained on the basis of Baidu encyclopedia linguistic data on the network and using the extracted commodity entity words as indexes to obtain word vectors of the entity words;
and calculating cosine similarity between entity words. If a commodity name has a plurality of entity words, for example, "curtain accessory plastic ball" has 3 entity words, "curtain", "accessory" and "plastic ball", the cosine similarity between each entity word and each entity word of another commodity is calculated. The cosine similarity formula is as follows:
Figure BDA0002346787640000096
wherein
Figure BDA0002346787640000097
And
Figure BDA0002346787640000098
a word vector for a physical word.
For each group of commodity with the commodity name of the commodity entrance and the commodity sales, taking the maximum value of the similarity of all the entity words as the similarity between the 2 commodity names;
combining the commodity of the business and the sales items of each enterprise pairwise, calculating the similarity between the commodity names, and taking the maximum similarity as the similarity of the commodity of the business and the sales items; the items of the e-entry commodities comprise a1, a2 and a3, and the items of the pin-entry commodities comprise b1, b2 and b3, namely a1b1, a1b2, a1b3, a2b1, a2b2, a2b3, a3b1, a3b2 and a3b in two-by-two combination. And setting a threshold value, if the similarity of the commodity of the business sales item is greater than the threshold value, determining the enterprise as a normal enterprise, and otherwise, reserving for further judgment.
Optionally, the association analysis in the combined entity vocabulary obtained by the association analysis includes:
performing word segmentation on the sale item commodity name and the entry item commodity name in the related industry, and extracting entity words;
counting the frequency of occurrence of each extracted entity word for all commodity sales;
counting the frequency of simultaneous occurrence of each pair of entity words, namely the frequency of combined entity words, for each group of item selling commodities and item entering commodities;
comparing the frequency of occurrence of each entity word with a second set threshold value to obtain a cancellation entity word;
and comparing the frequency numbers of the combined entity words containing the item canceling entity words with a third set threshold value to obtain a combined entity word bank.
In a specific application scenario, whether the commodity name similarity measurement is used for judging whether the commodity is abnormal or not is judged, and the method is not completely applicable to production type, processing type or service type enterprises, and the condition that the commodity is a T-shirt and the commodity is yarn is possibly judged to be abnormal. In order to improve the accuracy of judgment, the invention adopts a correlation analysis mode to seek reasonable commodity combination of purchase and sale items. The specific implementation method comprises the following steps:
utilizing a word segmentation tool to segment the names of the commodity of the business items and the sale items and extracting entity words;
counting the frequency of occurrence of each extracted entity word aiming at all commodity sales;
and counting the times of simultaneous occurrence of each pair of entity words aiming at each pair of sales commodities and entry commodities. And if the commodity name has a plurality of entity words, matching every two entity words of the two commodity names and counting. For example, the entity words of the merchandize product "terylene dyeing embroidery cloth" are "terylene" and "cloth", and the entity words of the entry product "cashmere yarn" are "cashmere" and "yarn", then the combination forms are "terylene" + "cashmere", "terylene" + "yarn", "cloth" + "yarn" and "cloth" + "cashmere", and the frequency of occurrence of each combination is increased by 1;
setting 2 thresholds k and d, firstly screening data with frequency of each entity word larger than k according to the commodity statistics of the sales items, then screening the data with frequency of appearance larger than d from the combined entity words containing the screened entity words of the sales items. The resulting entity word combinations are considered reasonable commodity combinations for the sale.
And if the sales item commodity entity and the input item commodity entity of one enterprise appear in the obtained entity word combination, judging that the sales item of the enterprise is normal, otherwise, judging that the sales item is abnormal.
Optionally, the determining whether the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list are in a combined entity library obtained through association analysis, and determining whether the tax payment behavior is abnormal includes:
if a combined entity word obtained based on the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list appears in a combined entity word library, the tax payment behavior is considered normal;
otherwise, the tax payment behavior is considered abnormal.
Optionally, as shown in fig. 2, after the step of obtaining the list of the merchandize items and the list of the bought merchandize items, the method further includes:
step S201: judging whether the main commodity sale list or the main commodity purchase list is empty or not;
if an enterprise only has commodity sales data and no commodity sales data, or only has commodity sales data and no commodity sales data, the enterprise is judged to have abnormal tax payment behaviors.
Or/and
step S202: judging whether the sale item is normal or not based on the number of the commodities in the main sale commodity list or the main purchase commodity list and the recorded content;
specifically, counting the number of commodities in an entry commodity list and a sales commodity list, if the number of the commodities in the entry commodity list is 1 and the commodities in the entry commodity list contain characters such as detailed description, detailed list, detailed description, gold tax disc and tax control disc through fuzzy matching, or the number of the commodities in the sales commodity list is 1 and the commodities in the detailed description, detailed list, detailed description and the like through fuzzy matching, determining that the sales items of the enterprise are abnormal; if not, the procedure proceeds to step S203,
or/and
step S203: acquiring an intersection of a main sales commodity set and a main purchase commodity set based on the main sales commodity list and the main purchase commodity list, and judging whether the sales items are normal based on the intersection;
solving the intersection of the main sales commodity set and the main purchase commodity set, and if the intersection is not empty, judging that the enterprise sales entry is normal; otherwise, the process proceeds to step S204.
Or/and
step S204: judging whether the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list comprise the same phrase or not, thereby judging whether the sale-in item is normal or not;
checking whether the name of the main commodity contains the name of the main commodity, for example, the name of the marketing item is 'U-shaped bolt with a nut', the name of the entering commodity is 'U-shaped bolt', or whether the name of the main commodity contains the name of the main commodity, for example, the name of the marketing item is 'knitting dyed cloth', the name of the entering commodity is 'sticky brocade knitting dyed cloth', if yes, the name of the enterprise marketing item is judged to be normal, otherwise, the operation goes to step S205.
Or/and
step S205: judging whether the name of the main commodity and the name of the main commodity are character strings which can not be identified at all;
if the commodity name is not completely unidentifiable, removing unidentifiable character strings in the main sales commodity name and the main purchase commodity name, and reserving the identifiable character strings;
and taking the intersection of the character strings which can be identified in the name of the main sales commodity and the character strings which can be identified in the name of the main purchase commodity, obtaining the number m of intersection elements and the maximum length n of 2 character strings, and judging whether the sales item is normal or not based on the comparison of the m/n and a fourth set threshold value.
Step S201-step S205 all belong to rule judgment, and are carried out in sequence, and if a clear conclusion can be obtained, the next judgment is not needed. The judgment method from step S201 to step S205 is easy to see, and if the conclusion can be reached through a simple method, the judgment efficiency is improved. Only if the simple method cannot judge, the next complicated judgment is necessary.
And removing letters and numbers in the name of the main selling commodity and the name of the main purchasing commodity by using a regular expression. If all the commodity sales or commodity entries are combinations of letters and numbers which cannot be identified, directly judging that the commodity sales or commodity entries are abnormal; otherwise, converting the commodity name character strings without letters and numbers into sets, and solving the intersection of the 2 sets. For example, the main commercial name is "refrigerant R134A", the main commercial name is "refrigerant R410A", the intersection of the character string sets is set after letters and numbers in the names are removed ('system', 'cold', 'agent'),. the number m of the intersection elements and the maximum length n of 2 character strings are calculated.
In a specific application scenario, before the steps of obtaining the list of the commodity to be sold and obtaining the list of the commodity to be purchased, preprocessing data of the commodity list is required, and the method specifically includes:
the rich diversity of commodity types determines the complexity of commodity names, and in addition, the problem of irregular filling exists, and in order to ensure the accuracy and the effectiveness of subsequent commodity name similarity calculation, the commodity name data needs to be cleaned. The method specifically comprises the following steps:
converting all English letters in the commodity name into capital letters, and converting Chinese symbols into English symbols;
meaningless symbols are removed. Such as spaces, equal numbers, etc.;
the construction key is a number, a letter or a combination thereof, the value is a map of the commodity name, and the commodity name with only the letter and the number is corrected. For example, "201" stands for "201 stainless steel", "p.o42.5 stands for" cement p.o42.5 "," OPPO "stands for" OPPO handset ", etc.
Optionally, after the steps of obtaining the list of the merchandize items and obtaining the list of the bought merchandize items, the method further includes:
step S206: judging whether the tax payment behavior is abnormal or not based on the set dictionary, comprising the following steps:
converting the character strings of the name of the main sale commodity and the name of the main purchase commodity into a character set;
solving intersection of the character set and a set single word dictionary, and calculating the number of elements of the intersection;
if the number of the elements is zero, traversing the words in the set multi-word dictionary one by one, checking whether the name of the main commodity and the name of the main commodity contain the words in the multi-word dictionary, and correcting the set single-word dictionary and the set multi-word dictionary according to the feedback result.
In a specific application scenario, such as chemicals and medicines, due to the strong specificity of the names of the chemicals and medicines, for example, "2, 4, 5-triamino-6-hydroxypyrimidine sulfate", the similarity of the names of the chemicals and medicines is measured by word segmentation to judge whether the marketing item is abnormal or not, and the effect is often poor. In addition, the production and manufacturing processes of chemicals and medicines relate to very complicated chemical reactions which cannot be known by non-professional field personnel, so the embodiment identifies the chemicals and medicines by constructing a dictionary, determines the enterprises which have the condition that the commodity of.
The dictionary constructed in the present embodiment is divided into a single-word dictionary and a multiple-word dictionary. The single-word dictionary and the multi-word dictionary are respectively as follows:
one _ word _ di ═ Set ('acid', 'alkali', 'boron', 'chlorine', 'strontium', 'bromine', 'cyanogen', 'selenium', 'nitrate', 'sulfo', 'carbon', 'iodine', 'fluorine', 'phosphorus', 'hydrocarbon', 'alkane', 'alkene', 'benzene', 'alcohol', 'ether', 'aldehyde', 'phenol', 'ammonia', 'ketone', 'quinone', 'spiro', 'silicon', 'amine', 'ammonium', 'acetylene', 'sulfur', 'helium', 'hydrogen', 'carbon', 'nitrogen', 'oxygen', 'neon', 'sodium', 'magnesium', 'aluminum', 'acyl', 'potassium', 'calcium', 'titanium', 'vanadium', 'chromium', 'cobalt', 'calcium', 'zinc', 'calcium', 'vanadium', 'calcium', 'copper', 'tin', 'calcium', 'tin', 'calcium', 'tin', 'calcium', 'tin', 'calcium', 'tin', the term "peptide" refers to the amino acid sequence of the enzyme, and the amino acid sequence of the enzyme, the wax, the enzyme, the, 'paint', 'ball', 'lun', 'foil', 'itch', 'pouch', 'scar', 'button', 'polyester')
multi _ words _ di ═ List ("HCPE", "elasticated filaments", "genes", "AK sugars", "alloys", "parent rolls", "AOS powders", "peptide beep", "P204", "S930/4757", "EPS", "RSS3", "7-AVCA", "CPVC", "PVDF", "PVPK30", "FASTCOLORREDS", "R245", "EAA", "HCFC-141B", "BPI-2009", "5-AT", "CAB-35", "ATP", "D4", "PPT30S", "ASA", "CAMP", "DMTDA", "LLDPE", "PP-R", "R142B", "PA66", "SF-1", "TPO", "TPU", "EPE", "HPMC", "TBHQ", "SBS", "UV1173", "VB", "DMEA", "LED", "MBS", "CELGARD", "ECDP", "EDTA", "SBR", "ISINE-1-LFONAMAMAMAMAMD", "TPE", "PF", "PA", "F-53B", "AEO2", "CFC-113A", "VAE", "PVC", "PCT", "PVDF", "PTMEG", "PTA", "RP-CHOP", "TENOFOVAVERALAFEMIDE copolymer", "SAM980IPYS3-10", "PPS", "DLPA", "SCRWF", "TYRVEK", "PMMA", "TPU", "PBT", "PPR", "PP", "BHT", "PTFE", "PEEK", "FEP", "EPE", "MTBE", "PE", "PET", "MTBE", "PU", "EVA", "ABS", "BOPP", "TPR", "PVB", "VOC", "CAMP", "HFC-227EA", "R134A", "R22", "B stock", "PS sheet", "sheathing stock", "pigment", "toner", "color concentrate", "dye", "color paste", "urea", "rosin", "Misong", "cold" etc., bezoar, rehmannia root, eye drops, honeysuckle flower, poppy, kangdi, pril, bilid, iron red, shower nozzle, analgin, cinchonin, ginkgo, slow release, rock, guaiacum, diazepam, medical, antigen, factor, salicyl, angelica, kudzu root, plain tablet, prednisone, cillin, tribulus, digoxin, glucose, malt, lactose, dialysis, caine, injection, operation, ultrasound, surgery, transfusion, oral liquid, poria, cyperus tuber, children, atractylodes macrocephala, syrup, cooling oil, vaseline, antibody and ion, "stephania tetrandra", "biology", "granule", "ipecac", "dressing", "dentistry", "denture", "dentistry", "test paper", "eucommia", "electrolysis", "socket", "switch", "nylon", "suture material", "polarographic paper", "scutellaria", "primordium", "ophiopogon", "mountain-course years", "ondansetron", "rifam", "glycerin", "compound", "bilin", "barbiturate", "cortisone", "pituitary", "mercuril", "flood", "water regulation", "filling material", "skeleton", "enteric", "enteroscope", "sausage casing", "master batch", "yellow dextrin", "extinction", "dispersion", "permanent", "activity", "chemistry", "skin strip", "woven bag", "casting", "forging", "bearing", "extract", "left-handed", "cable", "ultralight clay", "flame retardant", "Youfukang", "conradine", "Houkungsan", "fiber", "rattan", "white oil", "degradation", "condensation product", "KT board", "tire", "organic board", "inorganic", "nylon", "bottle cap", "clinical", "dye", "compound", "nonwoven", "sponge", "quartz", "fitting", "part", "faucet", "Jialihong", "mask", "color flag", "meta-urea", "transparency", "Mominkemi", "alcohol", "limestone", "chemical", "tin paste for bags", "anti-corrosion", "asphalt", "turtle age", "raw material tape", "acrylic", "refrigerant", "toothbrush", "Kathon", "mouth", "essence", "kaolin", "gypsum", "X14CRMOS17", "410", "201", "C30", "2CR13", "304/NO.1", "304+/-0.03", "304", "Q345B", "12L14", "DTY", "HOY", "FDY", "FOY", "70#", "45#", "60SI2MNA", "3CR2W8V", "POY", "CVC", "60SI2MNA", "30MNSI", "38CRMOAL", "1CR17", "33", "15CRMO", "H62+ Q235B", "AG/CU", "40CR", "65MN", "2A12", "42CRMO", "TP316L", "50BV30", "5D/72F", "S30400", "KP", "GCR 15".
On the base layer of the dictionary constructed as above, the specific implementation flow is as follows:
and converting the character strings of the item entering commodity name and the item selling commodity name into a character set. For example, "3, 4-difluorobenzonitrile" to set ('3', '4', '-', 'di', 'fluoro', 'benzene', 'fine');
solving intersection of the character set of the commodity names of the business items and the sales items and the single character dictionary, and calculating the number of the elements of the intersection;
if the number of the intersection elements is more than 0, marking the name of the commodity with a label 'true', and representing that the commodity is a chemical or a medicine; otherwise, traversing words in the multi-word dictionary one by one, checking whether the commodity names of the entry and the sale items contain the words in the multi-word dictionary, if so, marking a label 'true' and stopping the traversal, and otherwise, marking a label 'false'.
If the labels of the commodity names of the business items and the sales items are all 'true' and the screening conditions are met, the enterprise is judged to be a normal enterprise; if the labels of the item selling goods are all 'true' and the labels of the item entering goods are all 'false', the enterprise is judged to be an abnormal enterprise; otherwise, entering the next layer for judgment. Since the dictionary is defined according to the words which appear in the chemicals and the medicines at high frequency and the characteristics of the words, and may also appear in the names of non-chemicals and medicines, some screening conditions need to be set, and the words in the dictionary are contained in the names of the commodity which is sold and sold, but actually, the judgment of the situation of the chemicals and the medicines is not corrected. The dictionary may also be continually updated.
Step S207 and step S206 are applicable to individual enterprises, and step S207 is applicable to production, processing or service enterprises, such as production, entry is raw material, and sale is product produced based on raw material, so that the names of the two enterprises may be greatly different, such as entry commodity "yarn" and sale commodity "T-shirt". Step S206 is mainly applicable to chemicals, medicines, and the like, and the name of the product has strong specificity.
Because the actual operation range of each enterprise is very wide, the types related to the commodities of the sale and sale items are numerous, and the analysis of all the commodities is not only low in efficiency, but also low in significance. Therefore, after the data preprocessing is completed, the main sales commodity list and the main purchase commodity list of each enterprise need to be determined, and non-main incoming and selling commodities are not analyzed. For each enterprise, the specific determination method is as follows:
optionally, obtaining a list of merchandize items or obtaining a list of bought items includes:
acquiring the sum of all sold commodities and all entered commodities;
acquiring the sum of each item selling commodity and each item entering commodity;
dividing the sum of each item of sale commodity by the sum of all the item of sale commodities to obtain a plurality of first proportional values; dividing the sum of each item commodity by the sum of all the item commodities to obtain a plurality of second proportional values;
accumulating the plurality of first proportional values in a descending order to obtain a first accumulation result, stopping accumulation when the first accumulation result is larger than a fifth set threshold, wherein a list of marketing commodities corresponding to the first accumulation result larger than the fifth set threshold is a master marketing commodity list;
and accumulating the plurality of second proportional values in the descending order to obtain a second accumulated result, stopping accumulation when the second accumulated result is larger than a sixth set threshold, wherein the list of the inlet commodities corresponding to the second accumulated result larger than the sixth set threshold is the main commodity list.
In a specific application scene, the sums of all items sold and all items entered are respectively collected;
respectively summarizing the amount of each sale item and the amount of each entrance item;
dividing the sum of each sale/input commodity by the sum of all sale/input commodities, and arranging the obtained proportions in a descending order;
and accumulating the proportions one by one, stopping accumulation once the accumulated value exceeds a set threshold value, and determining the accumulated sale/entrance commodities as the main sale/entrance commodities.
In a specific application scenario, as shown in fig. 2, the steps S201, S202, S203, S204, S205, S206, S102, S103 and S207 may be performed in the order, or as shown in fig. 3, the steps S201, S202, S203, S204, S205, S206, S207 and S102 and S103 may be performed in the order, and since it is not known in advance which type of enterprise the enterprise belongs to, the corresponding step order is selected from the enterprise itself.
In the embodiment, various forms of dictionaries are constructed, including a dictionary with keys for commodity name correction in the form of numbers, letters or a combination thereof, a dictionary with values in the form of commodity name maps, a dictionary with chemicals and medicines, and a reasonable commodity entity combination dictionary with sales entries. These dictionaries are necessary and important for improving the accuracy of the judgment of whether the items to be sold are abnormal or not.
The method comprehensively utilizes various means such as whether the intersection of the commodity names of the input and sales items is empty, judging whether the commodity names of the input and sales items contain each other, screening the similarity of the character strings of the commodity names of the input and sales items, calculating whether the commodity names of the input and sales items are similar in semantic by using an NLP technology and the like to measure the correlation among the commodities of the input and sales items of the enterprise, and is used for judging whether abnormal tax payment behaviors exist.
And reasonable commodity combinations of the purchase and sale items are searched by using correlation analysis, so that the problem of judging whether the purchase and sale items of production or processing enterprises are abnormal is solved.
The method has the advantages that multiple means such as whether the commodity name is sold or not, setting rules and self-defining dictionaries, measuring commodity name similarity and correlation analysis are adopted, the commodity name of the commodity is analyzed, whether the commodity name of the enterprise is abnormal or not is judged layer by layer, the accuracy rate is higher than that of the judgment method based on a single method and commodity codes, the applicability is stronger, the problem of judging whether the commodity name of the enterprise is abnormal or not is solved, the identification range of doubtful taxpayers is expanded, the identification efficiency and accuracy of the doubtful taxers are improved, a monitoring means is provided for national tax collection management and tax risk management, tax evasion behaviors are prevented, and tax security driving is protected.
An embodiment of the present disclosure provides an electronic device comprising a memory and a processor,
a memory storing executable instructions;
and the processor executes executable instructions in the memory to realize the method for identifying the abnormal tax payment behaviors.
The memory is to store non-transitory computer readable instructions. In particular, the memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions. In one embodiment of the disclosure, the processor is configured to execute the computer readable instructions stored in the memory.
Those skilled in the art should understand that, in order to solve the technical problem of how to obtain a good user experience, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures should also be included in the protection scope of the present disclosure.
For the detailed description of the present embodiment, reference may be made to the corresponding descriptions in the foregoing embodiments, which are not repeated herein.
The embodiment of the disclosure provides a computer readable storage medium, which stores a computer program, and the computer program realizes the method for identifying abnormal tax payment behaviors when being executed by a processor.
A computer-readable storage medium according to an embodiment of the present disclosure has non-transitory computer-readable instructions stored thereon. The non-transitory computer readable instructions, when executed by a processor, perform all or a portion of the steps of the methods of the embodiments of the disclosure previously described.
The computer-readable storage media include, but are not limited to: optical storage media (e.g., CD-ROMs and DVDs), magneto-optical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with built-in rewritable non-volatile memory (e.g., memory cards), and media with built-in ROMs (e.g., ROM cartridges).
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims (10)

1. A method of identifying abnormal tax payment behavior, comprising:
acquiring a main sales commodity list based on the ratio of the summarized amount of each sales commodity to the summarized amounts of all sales commodities;
acquiring a main purchased commodity list based on the ratio of the total amount of each input commodity to the total amount of all the input commodities;
processing the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list based on a natural language processing technology to obtain a first processing result;
and judging whether the tax payment behaviors are abnormal or not based on the first processing result.
2. The method of identifying abnormal tax payment behavior according to claim 1, wherein,
before or after the step of processing the name of the master sale commodity in the master sale commodity list and the name of the master purchase commodity in the master purchase commodity list based on the natural language processing technology to obtain the processing result, the method further comprises:
and judging whether the name of the main sale commodity in the main sale commodity list and the name of the main purchase commodity in the main purchase commodity list are in a combined entity word library obtained through correlation analysis or not, and judging whether the tax payment behavior is abnormal or not.
3. The method for identifying abnormal tax payment behaviors according to claim 1, wherein the processing the name of the main sale commodity in the main sale commodity list and the name of the main purchase commodity in the main purchase commodity list based on the natural language processing technology to obtain the first processing result comprises:
performing word segmentation on the name of the main sale commodity and the name of the main purchase commodity, and extracting entity words;
acquiring a word vector of the extracted entity word by using the acquired word vector resource;
calculating cosine similarity between entity words based on the word vectors;
for each group of commodities, taking the maximum value of the cosine similarity of all the entity words as the commodity name similarity of the group of commodities, wherein each group of commodities comprises an item entering commodity and an item selling commodity;
combining the name of the main commodity in the main commodity list with the name of the main commodity in the main commodity list to form a plurality of groups of commodities, and calculating the similarity of the names of the commodities of each group;
selecting the largest commodity name similarity as the commodity similarity of the sale-in item;
and judging the similarity of the commodity of the sale-in item and a first set threshold value.
4. The method for identifying abnormal tax payment behaviors as claimed in claim 3, wherein in the calculating cosine similarity between entity words based on the word vector, the calculation formula of the cosine similarity is:
Figure FDA0002346787630000021
wherein
Figure FDA0002346787630000022
And
Figure FDA0002346787630000023
is a word vector of a solid word, and is a vector
Figure FDA0002346787630000024
Sum vector
Figure FDA0002346787630000025
The die of (1).
5. The method for identifying abnormal tax payment behavior according to claim 3, wherein,
if the commodity similarity of the sales items is larger than a first set threshold value, the tax payment behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
6. The method for identifying abnormal tax payment behavior according to claim 2, wherein the association analysis in the combined entity thesaurus obtained by the association analysis comprises:
performing word segmentation on the sale item commodity name and the entry item commodity name in the related industry, and extracting entity words;
counting the frequency of occurrence of each extracted entity word for all commodity sales;
counting the frequency of simultaneous occurrence of each pair of entity words, namely the frequency of combined entity words, for each group of item selling commodities and item entering commodities;
comparing the frequency of occurrence of each entity word with a second set threshold value to obtain a cancellation entity word;
and comparing the frequency numbers of the combined entity words containing the item canceling entity words with a third set threshold value to obtain a combined entity word bank.
7. The method for identifying abnormal tax payment behaviors as claimed in claim 6, wherein said determining whether the name of the main purchased goods in the main purchased goods list and the name of the main purchased goods in the main purchased goods list are in the combined entity word bank obtained by the association analysis, and determining whether the tax payment behaviors are abnormal comprises:
if a combined entity word obtained based on the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list appears in a combined entity word library, the tax payment behavior is considered normal;
otherwise, the tax payment behavior is considered abnormal.
8. The method for identifying abnormal tax payment behaviors as claimed in claim 1, wherein said steps of obtaining a list of merchandize and obtaining a list of bought merchandize are followed by further comprising:
judging whether the main commodity sale list or the main commodity purchase list is empty or not;
or/and
judging whether the sale item is normal or not based on the number of the commodities in the main sale commodity list or the main purchase commodity list and the recorded content;
or/and
acquiring an intersection of a main sales commodity set and a main purchase commodity set based on the main sales commodity list and the main purchase commodity list, and judging whether the sales items are normal based on the intersection;
or/and
judging whether the name of the main commodity in the main commodity list and the name of the main commodity in the main commodity list comprise the same phrase or not, thereby judging whether the sale-in item is normal or not;
or/and
judging whether the name of the main commodity and the name of the main commodity are character strings which can not be identified at all;
if the commodity name is not completely unidentifiable, removing unidentifiable character strings in the main sales commodity name and the main purchase commodity name, and reserving the identifiable character strings;
and taking the intersection of the character strings which can be identified in the name of the main sales commodity and the character strings which can be identified in the name of the main purchase commodity, obtaining the number m of intersection elements and the maximum length n of 2 character strings, and judging whether the sales item is normal or not based on the comparison of the m/n and a fourth set threshold value.
9. The method for identifying abnormal tax payment behaviors as claimed in claim 1, wherein said steps of obtaining a list of merchandize and obtaining a list of bought merchandize are followed by further comprising:
judging whether the tax payment behavior is abnormal or not based on the set dictionary, comprising the following steps:
converting the character strings of the name of the main sale commodity and the name of the main purchase commodity into a character set;
solving intersection of the character set and a set single word dictionary, and calculating the number of elements of the intersection;
if the number of the elements is zero, traversing the words in the set multi-word dictionary one by one, checking whether the name of the main commodity and the name of the main commodity contain the words in the multi-word dictionary, and correcting the set single-word dictionary and the set multi-word dictionary according to the feedback result.
10. The method for identifying abnormal tax payment behaviors of claim 1, wherein obtaining a list of products for sale or obtaining a list of products for purchase comprises:
acquiring the sum of all sold commodities and all entered commodities;
acquiring the sum of each item selling commodity and each item entering commodity;
dividing the sum of each item of sale commodity by the sum of all the item of sale commodities to obtain a plurality of first proportional values; dividing the sum of each item commodity by the sum of all the item commodities to obtain a plurality of second proportional values;
accumulating the plurality of first proportional values in a descending order to obtain a first accumulation result, stopping accumulation when the first accumulation result is larger than a fifth set threshold, wherein a list of marketing commodities corresponding to the first accumulation result larger than the fifth set threshold is a master marketing commodity list;
and accumulating the plurality of second proportional values in the descending order to obtain a second accumulated result, stopping accumulation when the second accumulated result is larger than a sixth set threshold, wherein the list of the inlet commodities corresponding to the second accumulated result larger than the sixth set threshold is the main commodity list.
CN201911397878.2A 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior Active CN111192128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911397878.2A CN111192128B (en) 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911397878.2A CN111192128B (en) 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior

Publications (2)

Publication Number Publication Date
CN111192128A true CN111192128A (en) 2020-05-22
CN111192128B CN111192128B (en) 2023-06-02

Family

ID=70707814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911397878.2A Active CN111192128B (en) 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior

Country Status (1)

Country Link
CN (1) CN111192128B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418652A (en) * 2020-11-19 2021-02-26 税友软件集团股份有限公司 Risk identification method and related device
CN112613929A (en) * 2020-12-17 2021-04-06 山东浪潮商用系统有限公司 Invoice false invoice recognition method and system based on semantic analysis
CN113869802A (en) * 2021-12-01 2021-12-31 神州数码信息系统有限公司 Production enterprise invoice false invoice risk assessment method based on sales entry comparison

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150012264A1 (en) * 2012-02-15 2015-01-08 Rakuten, Inc. Dictionary generation device, dictionary generation method, dictionary generation program and computer-readable recording medium storing same program
CN104636341A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Data cleaning storage method for added value tax one-number multi-name monitoring
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
US20180025364A1 (en) * 2016-07-20 2018-01-25 Nec Personal Computers, Ltd. Information processing apparatus, information processing method, and program
CN107729937A (en) * 2017-10-12 2018-02-23 北京京东尚科信息技术有限公司 For determining the method and device of user interest label
CN108242020A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for calculating income and selling diversity factor between item item lists
CN110019807A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of commodity classification method and device
CN110046978A (en) * 2019-03-19 2019-07-23 上海大学 Intelligent method of charging out

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150012264A1 (en) * 2012-02-15 2015-01-08 Rakuten, Inc. Dictionary generation device, dictionary generation method, dictionary generation program and computer-readable recording medium storing same program
CN104636341A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Data cleaning storage method for added value tax one-number multi-name monitoring
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
US20180025364A1 (en) * 2016-07-20 2018-01-25 Nec Personal Computers, Ltd. Information processing apparatus, information processing method, and program
CN108242020A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for calculating income and selling diversity factor between item item lists
CN107729937A (en) * 2017-10-12 2018-02-23 北京京东尚科信息技术有限公司 For determining the method and device of user interest label
CN110019807A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of commodity classification method and device
CN110046978A (en) * 2019-03-19 2019-07-23 上海大学 Intelligent method of charging out

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯凯;王小华;谌志群;: "基于动态规划的汉语句子相似度算法" *
秦成磊;魏晓;: "中文在线评论中的商品特征聚类研究" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418652A (en) * 2020-11-19 2021-02-26 税友软件集团股份有限公司 Risk identification method and related device
CN112418652B (en) * 2020-11-19 2024-01-30 税友软件集团股份有限公司 Risk identification method and related device
CN112613929A (en) * 2020-12-17 2021-04-06 山东浪潮商用系统有限公司 Invoice false invoice recognition method and system based on semantic analysis
CN113869802A (en) * 2021-12-01 2021-12-31 神州数码信息系统有限公司 Production enterprise invoice false invoice risk assessment method based on sales entry comparison

Also Published As

Publication number Publication date
CN111192128B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN111192128A (en) Method for identifying abnormal tax payment behaviors
CN107967208B (en) Python resource sensitive defect code detection method based on deep neural network
CN104794192B (en) Multistage method for detecting abnormality based on exponential smoothing, integrated study model
CN105224807B (en) Case auditing rule extracting method and device, case checking method and system
Le et al. Beyond support and confidence: Exploring interestingness measures for rule-based specification mining
CN107545422A (en) A kind of arbitrage detection method and device
CN108345587A (en) A kind of the authenticity detection method and system of comment
CN108376164B (en) Display method and device of potential anchor
CN107423278A (en) The recognition methods of essential elements of evaluation, apparatus and system
CN106815200A (en) Objectionable text detection method and device based on keyword
CN109241527B (en) Automatic generation method of false comment data set of Chinese commodity
CN108647800A (en) A kind of online social network user missing attribute forecast method based on node insertion
CN110427548A (en) Information-pushing method, information push-delivery apparatus and computer readable storage medium
CN111859909B (en) Semantic scene consistency recognition reading robot
CN104991953A (en) Coarse and fine granularity video searching method based on reverse index
CN110443290A (en) A kind of product competition relationship quantization generation method and device based on big data
Abdelhamid et al. Automatic bank fraud detection using support vector machines
Almeida et al. Deriving vegetation indices for phenology analysis using genetic programming
Li et al. Incorporate online hard example mining and multi-part combination into automatic safety helmet wearing detection
CN108519993A (en) The social networks focus incident detection method calculated based on multiple data stream
CN112199480A (en) BERT model-based online dialog log violation detection method and system
CN116361488A (en) Method and device for mining risk object based on knowledge graph
Melnykov et al. Recent developments in model-based clustering with applications
CN107092600B (en) Information identification method and device
Kuang et al. Ontology-driven hierarchical deep learning for fashion recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant