CN111192128B - Method for identifying abnormal tax payment behavior - Google Patents

Method for identifying abnormal tax payment behavior Download PDF

Info

Publication number
CN111192128B
CN111192128B CN201911397878.2A CN201911397878A CN111192128B CN 111192128 B CN111192128 B CN 111192128B CN 201911397878 A CN201911397878 A CN 201911397878A CN 111192128 B CN111192128 B CN 111192128B
Authority
CN
China
Prior art keywords
commodity
main
sales
list
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911397878.2A
Other languages
Chinese (zh)
Other versions
CN111192128A (en
Inventor
刘芬
王志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201911397878.2A priority Critical patent/CN111192128B/en
Publication of CN111192128A publication Critical patent/CN111192128A/en
Application granted granted Critical
Publication of CN111192128B publication Critical patent/CN111192128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The embodiment of the disclosure discloses a method for identifying abnormal tax payment behaviors, which comprises the following steps: acquiring a main commodity list based on the ratio of the total amount of each commodity in the sales items to the total amount of all commodity in the sales items; acquiring a main purchase commodity list based on the ratio of the total amount of each item commodity to the total amount of all item commodities; processing the main commodity names in the main commodity list and the main commodity names in the main commodity list based on a natural language processing technology to obtain a first processing result; and judging whether the tax payment behavior is abnormal or not based on the first processing result. The purpose of improving the recognition efficiency of abnormal tax payment behaviors is achieved.

Description

Method for identifying abnormal tax payment behavior
Technical Field
The present disclosure relates to the field of information technology, and more particularly, to a method for identifying abnormal tax payment.
Background
Based on the value-added tax invoice goods detail data, abnormal behaviors such as 'incomplete sales, inconsistent deduction, false invoice' and the like are identified by analyzing the commodity of the income and sales item, and the method is an important means for preventing and controlling tax risks. However, the diversity, complexity, and non-standardization of commodity names, results in difficulty in identifying the same commodity entity and similar commodities. In addition, there is a great difference between the goods of the sales items of the production or processing enterprises, so that whether the goods are abnormal or not cannot be judged directly by measuring the similarity degree of the names of the goods of the sales items. The existing method is used for calculating the difference degree of the goods in the sales item based on the goods codes or the simple goods name similarity, so that abnormal tax payment behaviors are identified. However, the nature of commodity coding and commodity name many-to-many, and the simple commodity name similarity calculation method often lead to the problem of lack of accuracy and comprehensiveness in analysis. The existing abnormal tax payment behavior identification has the problem of low efficiency and accuracy.
Disclosure of Invention
In view of the above, the embodiments of the present disclosure provide a method for identifying abnormal tax payment behaviors, which at least solves the problems of low efficiency and accuracy in identifying abnormal tax payment behaviors in the prior art.
In a first aspect, an embodiment of the present disclosure provides a method for identifying abnormal tax returns, including:
acquiring a main commodity list based on the ratio of the total amount of each commodity in the sales items to the total amount of all commodity in the sales items;
acquiring a main purchase commodity list based on the ratio of the total amount of each item commodity to the total amount of all item commodities;
processing the main commodity names in the main commodity list and the main commodity names in the main commodity list based on a natural language processing technology to obtain a first processing result;
and judging whether the tax payment behavior is abnormal or not based on the first processing result.
Optionally, before or after the step of processing the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list based on the natural language processing technology to obtain a processing result, the method further includes:
judging whether the main commodity names in the main commodity list and the main commodity names in the main commodity list are in a combined entity word stock obtained through association analysis or not, and judging whether tax paying behaviors are abnormal or not.
Optionally, the processing, based on a natural language processing technology, the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list to obtain a first processing result includes:
dividing the main commodity name and the main commodity name, and extracting entity words;
acquiring word vectors of the extracted entity words by utilizing the acquired word vector resources;
calculating cosine similarity between entity words based on the word vectors;
for each group of commodities, taking the maximum value of cosine similarity of all entity words as the commodity name similarity of the group of commodities, wherein each group of commodities comprises an entry commodity and a sales commodity;
combining the names of the main sales commodities in the main sales commodity list with the names of the main purchase commodities in the main purchase commodity list to form a plurality of groups of commodities, and calculating the commodity name similarity of each group of commodities;
selecting the maximum commodity name similarity as the commodity similarity of the entry and sales items;
and judging the similarity of the commodity of the sale-entering item and the first set threshold value.
Optionally, in the calculating the cosine similarity between the entity words based on the word vector, a calculation formula of the cosine similarity is:
Figure BDA0002346787640000031
wherein the method comprises the steps of
Figure BDA0002346787640000032
And->
Figure BDA0002346787640000033
Is a word vector of the entity word, and the I a I and the I b I are vectors respectively>
Figure BDA0002346787640000034
Sum vector->
Figure BDA0002346787640000035
Is a mold of (a).
Optionally, if the similarity of the goods of the sale item is greater than a first set threshold, the tax payment behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
Optionally, the association analysis in the combined entity word stock obtained by the association analysis includes:
dividing the commodity names of the sales items and the commodity names of the entries in the related industry, and extracting entity words;
counting the frequency of each extracted entity word for all the sales items;
counting the frequency of simultaneous occurrence of each pair of entity words, namely the frequency of combined entity words, for each group of sales items and entry items;
comparing the frequency of each entity word with a second set threshold value to obtain a term entity word;
and comparing the frequency of the combined entity words containing the term entity words with a third set threshold value to obtain a combined entity word stock.
Optionally, the determining whether the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list are in a combined entity word stock obtained by association analysis, and determining whether the tax payment behavior is abnormal includes:
if the combined entity word obtained based on the main commodity name in the main commodity list and the main commodity name in the main commodity list appears in the combined entity word stock, the tax paying behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
Optionally, after the step of obtaining the main sales commodity list and the step of obtaining the main purchase commodity list, the method further includes:
judging whether the main sales commodity list or the main purchase commodity list is empty or not;
or/and (or)
Judging whether the purchase and sale items are normal or not based on the number of the commodities in the main sale commodity list or the main purchase commodity list and recorded contents;
or/and (or)
Acquiring an intersection of the main sales commodity set and the main purchase commodity set based on the main sales commodity list and the main purchase commodity list, and judging whether a sales entry is normal or not based on the intersection;
or/and (or)
Judging whether the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list comprise the same phrase, so as to judge whether the sales entry is normal;
or/and (or)
Judging whether the main commodity name and the main commodity name are character strings which cannot be recognized at all;
if the character strings cannot be identified completely, removing the character strings which cannot be identified in the main sales commodity names and the main purchase commodity names, and reserving the character strings which can be identified;
and acquiring the intersection of the character strings which can be identified in the main sales commodity name and the character strings which can be identified in the main purchase commodity name, acquiring the number m of intersection elements and the maximum length n of 2 character strings, and judging whether the sales entry is normal or not based on the comparison of m/n and a fourth set threshold value.
Optionally, after the step of obtaining the main sales commodity list and the step of obtaining the main purchase commodity list, the method further includes:
judging whether tax payment is abnormal based on the set dictionary, comprising:
converting the character strings of the main commodity name and the main commodity name into a character set;
solving intersection of the character set and a set single word dictionary, and calculating the number of elements of the intersection;
if the number of elements is zero, traversing words in the set multi-word dictionary one by one, checking whether the main commodity name and the main commodity name contain words in the multi-word dictionary, and correcting the set single-word dictionary and the set multi-word dictionary according to a feedback result.
Optionally, obtaining the main sales commodity list or obtaining the main purchase commodity list includes:
acquiring the sum of all the sales commodities and all the entry commodities;
acquiring the summarized amount of each sales commodity and each entry commodity;
dividing the total amount of each commodity of the sale item by the total amount of all commodity of the sale item to obtain a plurality of first proportional values; dividing the total amount of each item of commodity by the total amount of all items of commodity to obtain a plurality of second proportion values;
accumulating the plurality of first proportion values in a sequence from large to small to obtain a first accumulation result, stopping accumulation when the first accumulation result is larger than a fifth set threshold value, and obtaining a list of the sales items corresponding to the first accumulation result larger than the fifth set threshold value as a master sales item list;
and accumulating the plurality of second proportion values in the order from large to small to obtain a second accumulation result, stopping accumulation when the second accumulation result is larger than a sixth set threshold value, and obtaining a list of the incoming commodities corresponding to the second accumulation result larger than the sixth set threshold value as a main shopping commodity list.
The method comprises the steps that firstly, a master sales commodity list is obtained based on the ratio of the total amount of each sales commodity to the total amount of all sales commodities; based on the ratio of the aggregate amount of each of the incoming items to the aggregate amount of all of the incoming items, and acquiring a main purchase commodity list. Then, processing the main sales commodity names in the main sales commodity list and the main purchase commodity names in the main purchase commodity list based on a natural language processing technology to obtain a first processing result; and judging whether the tax payment behavior is abnormal or not based on the first processing result. And determining a main commodity list and a main commodity list of each enterprise, and not analyzing non-main commodity in and out. The recognition efficiency is improved. The trade names are treated as word segmentation by means of NLP (natural language processing) technology and pre-trained public word vector resources, and then the similarity degree of the trade names is calculated through word vectors, so that the semantic similarity of the trade names is measured, and the defect that only the literal similarity of the trade names is calculated is overcome; thereby improving the identification accuracy and achieving the purpose of improving the identification efficiency of the abnormal tax payment behavior.
The method also utilizes association analysis, combines the frequency of occurrence of the sales commodity and the frequency information of the simultaneous occurrence of the sales commodity, sets a threshold value to identify abnormal tax payment behaviors, and solves the problem of judging whether the sales commodity of production enterprises and processing enterprises is abnormal. In addition, based on the characteristics of the goods detail data, the custom dictionary and rules analyze the goods (such as chemicals and medicines) in the special field, so that the efficiency and accuracy of identifying abnormal tax payment behaviors can be improved.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout exemplary embodiments of the disclosure.
FIG. 1 illustrates a flow chart of a first method of identifying abnormal tax returns according to one embodiment of the present disclosure;
FIG. 2 illustrates a flow chart of a second method of identifying abnormal tax returns according to one embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a third method of identifying abnormal tax returns according to one embodiment of the present disclosure.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below. While the preferred embodiments of the present disclosure are described below, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein.
A method of identifying abnormal tax returns, comprising:
step S101: acquiring a main commodity list based on the ratio of the total amount of each commodity in the sales items to the total amount of all commodity in the sales items;
acquiring a main purchase commodity list based on the ratio of the total amount of each item commodity to the total amount of all item commodities;
the acquisition of the main commodity list and the acquisition of the main commodity list are not sequential, and can be performed simultaneously, or the main commodity list can be acquired first, then the main commodity list can be acquired, or the main commodity list can be acquired first, and then the main commodity list can be acquired.
Step S102: processing the main commodity names in the main commodity list and the main commodity names in the main commodity list based on a natural language processing technology to obtain a first processing result;
step S103: and judging whether the tax payment behavior is abnormal or not based on the first processing result.
Optionally, the method further comprises: step S207: judging whether the main commodity names in the main commodity list and the main commodity names in the main commodity list are in a combined entity word stock obtained through association analysis or not, and judging whether tax paying behaviors are abnormal or not.
Step S207 may precede or follow step S102, with step S207 not directly related to step S102. If it is determined in step S102 that the corporate tax payment is abnormal after the main sales commodity list and the main purchase commodity list are acquired, step S207 may not be performed.
Optionally, the processing, based on a natural language processing technology, the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list to obtain a first processing result includes:
dividing the main commodity name and the main commodity name, and extracting entity words;
acquiring word vectors of the extracted entity words by utilizing the acquired word vector resources;
calculating cosine similarity between entity words based on the word vectors;
for each group of commodities, taking the maximum value of cosine similarity of all entity words as the commodity name similarity of the group of commodities, wherein each group of commodities comprises an entry commodity and a sales commodity;
combining the names of the main sales commodities in the main sales commodity list with the names of the main purchase commodities in the main purchase commodity list to form a plurality of groups of commodities, and calculating the commodity name similarity of each group of commodities;
selecting the maximum commodity name similarity as the commodity similarity of the entry and sales items;
and judging the similarity of the commodity of the sale-entering item and the first set threshold value.
Optionally, in the calculating the cosine similarity between the entity words based on the word vector, a calculation formula of the cosine similarity is:
Figure BDA0002346787640000091
wherein the method comprises the steps of
Figure BDA0002346787640000092
And->
Figure BDA0002346787640000093
Is a word vector of the entity word, and the I a I and the I b I are vectors respectively>
Figure BDA0002346787640000094
Sum vector->
Figure BDA0002346787640000095
Is a mold of (a).
Optionally, if the similarity of the goods of the sale item is greater than a first set threshold, the tax payment behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
In a specific real-time scenario, if two commodity names have no identical word or word, but have similar semantics, for example, a "toilet" and a "toilet", it is impossible to determine whether the entry and sales item is abnormal by comparing whether the word surfaces of the commodity names are similar, and it is necessary to identify the commodity names with similar semantics by means of Natural Language Processing (NLP) technology. The specific implementation method is as follows:
and utilizing a word segmentation tool to segment the names of the goods of the business and the sales items, and extracting entity words. For example, "lady trousers" are divided into "lady" and "trousers", i.e., the physical words of the trade name;
the method comprises the steps of utilizing public Chinese word vector resources trained on a network based on hundred-degree encyclopedia, and obtaining word vectors of entity words by taking extracted commodity entity words as indexes;
and calculating cosine similarity among the entity words. If a commodity name has a plurality of entity words, for example, a 'curtain accessory plastic ball' has 3 entity words of 'curtain', 'accessory' and 'plastic ball', the cosine similarity between each entity word and each entity word of another commodity is calculated. The cosine similarity formula is as follows:
Figure BDA0002346787640000096
wherein the method comprises the steps of
Figure BDA0002346787640000097
And->
Figure BDA0002346787640000098
Is a word vector of entity words.
Aiming at each group of goods in and out, taking the maximum value of the similarity of all entity words as the similarity between the 2 commodity names;
aiming at each enterprise, every two of the business items are combined, the similarity between the names of the business items is calculated, and the maximum similarity is used as the similarity of the business items of the enterprise; if the item of merchandise comprises a1, a2 and a3, the item of merchandise comprises b1, b2 and b3, namely a1b1, a1b2, a1b3, a2b1, a2b2, a2b3, a3b1, a3b2 and a3b. Setting a threshold, if the commodity similarity of the business in and out item is larger than the threshold, determining the business as a normal business, otherwise, keeping to further judge.
Optionally, the association analysis in the combined entity word stock obtained by the association analysis includes:
dividing the commodity names of the sales items and the commodity names of the entries in the related industry, and extracting entity words;
counting the frequency of each extracted entity word for all the sales items;
counting the frequency of simultaneous occurrence of each pair of entity words, namely the frequency of combined entity words, for each group of sales items and entry items;
comparing the frequency of each entity word with a second set threshold value to obtain a term entity word;
and comparing the frequency of the combined entity words containing the term entity words with a third set threshold value to obtain a combined entity word stock.
In a specific application scenario, whether the commodity in and out is abnormal is judged by the commodity name similarity measurement, and the condition that the commodity in and out is a T-shirt and the commodity in is a yarn may be judged to be abnormal if the commodity in and out is not fully applicable to enterprises of production, processing or service. In order to improve the judging accuracy, the invention adopts a correlation analysis mode to seek reasonable commodity combination of the entry and the sales items. The specific implementation method is as follows:
the method comprises the steps of utilizing a word segmentation tool to segment names of goods of entries and sales items, and extracting entity words;
counting the frequency of each extracted entity word for all the sales items;
and counting the number of simultaneous occurrence of each pair of entity words for each pair of sales commodity and entry commodity. If the commodity name has a plurality of entity words, the entity words of the two commodity names are matched and counted. For example, the entity words of the commodity 'terylene dyeing embroidery cloth' are 'terylene' and 'cloth', and the entity words of the commodity 'cashmere yarn' are 'cashmere' and 'yarn', so that the combination forms are 'terylene' + 'cashmere', 'terylene' + 'yarn', 'cloth' + 'yarn' and 'cloth' + 'cashmere', and the frequency of each combination is increased by 1;
setting 2 thresholds k and d, screening out data with frequency greater than k of each entity word according to the commodity statistics of the sales items, screening out data with frequency greater than d from the combined entity words containing the screened out sales item entity words. The resulting physical word combinations are considered reasonable in-and-out item commodity combinations.
If the sales item commodity entity and the entry commodity entity of an enterprise appear in the obtained entity word combination, judging that the sales item of the enterprise is normal, otherwise, judging that the sales item is abnormal.
Optionally, the determining whether the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list are in a combined entity word stock obtained by association analysis, and determining whether the tax payment behavior is abnormal includes:
if the combined entity word obtained based on the main commodity name in the main commodity list and the main commodity name in the main commodity list appears in the combined entity word stock, the tax paying behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
Optionally, as shown in fig. 2, after the steps of obtaining the main sales commodity list and obtaining the main purchase commodity list, the method further includes:
step S201: judging whether the main sales commodity list or the main purchase commodity list is empty or not;
if one business only has sales commodity data, no entry commodity data, or only has entry commodity data, and no sales commodity data, the business is judged to be abnormal in tax payment.
Or/and (or)
Step S202: judging whether the purchase and sale items are normal or not based on the number of the commodities in the main sale commodity list or the main purchase commodity list and recorded contents;
specifically counting the number of commodities in the commodity list of the entry and the commodity list of the sale, if the number of the commodities in the commodity list of the entry is 1, and through fuzzy matching, the characters such as detailed view, list, detail, jin Shuipan, tax control disk and the like are contained, or the number of the commodities in the commodity list of the sale is 1, and through fuzzy matching, the characters such as detailed view, list, detail and the like are contained, then the abnormal entry and the sale of an enterprise are considered; otherwise it goes to step S203,
or/and (or)
Step S203: acquiring an intersection of the main sales commodity set and the main purchase commodity set based on the main sales commodity list and the main purchase commodity list, and judging whether a sales entry is normal or not based on the intersection;
solving an intersection of the main sales commodity set and the main purchase commodity set, and judging that the business sales entry is normal if the intersection is not empty; otherwise, the process advances to step S204.
Or/and (or)
Step S204: judging whether the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list comprise the same phrase, so as to judge whether the sales entry is normal;
checking whether the main commodity name contains the main commodity name, for example, the commodity name of the sale item is a U-shaped bolt with a nut, the commodity name of the entry item is a U-shaped bolt, or whether the commodity name of the main commodity contains the main commodity name, for example, the commodity name of the sale item is a knitting dyeing cloth, the commodity name of the entry item is a viscose nylon knitting dyeing cloth, if so, the business entry item is judged to be normal, otherwise, the step S205 is carried out.
Or/and (or)
Step S205: judging whether the main commodity name and the main commodity name are character strings which cannot be recognized at all;
if the character strings cannot be identified completely, removing the character strings which cannot be identified in the main sales commodity names and the main purchase commodity names, and reserving the character strings which can be identified;
and acquiring the intersection of the character strings which can be identified in the main sales commodity name and the character strings which can be identified in the main purchase commodity name, acquiring the number m of intersection elements and the maximum length n of 2 character strings, and judging whether the sales entry is normal or not based on the comparison of m/n and a fourth set threshold value.
All of steps S201 to S205 belong to rule judgment, and are performed sequentially, and if an explicit conclusion can be obtained, the next step of judgment is not required. The judging method of step S201-step S205 is easy to be difficult, and if the conclusion can be drawn by a simple method, the judging efficiency is improved. Only if the simple method cannot judge, the next complex judgment is necessary.
Letters and numbers in the main sales commodity name and the main purchase commodity name are removed by using a regular expression. If all the sales commodities or the entry commodities are unrecognizable combinations of letters and numbers, directly judging that the sales commodities or the entry commodities are abnormal; otherwise, converting the commodity name character string with the letters and numbers removed into sets, and solving the intersection of the 2 sets. For example, the main commodity name is "refrigerant R134A", the main commodity name is "refrigerant R410A", the letters and numerals in the names are removed, and then the intersection of the character string sets is set ('system', 'cold', 'agent'). The number m of intersection elements and the maximum length n of 2 character strings are calculated, a threshold value is set, and if m/n is equal to or greater than the threshold value, it is determined that the enterprise has no abnormal tax administration.
In a specific application scenario, the step of acquiring the main sales commodity list and the main purchase commodity list further requires preprocessing the data of the commodity list, which specifically includes:
the variety of commodity types determines the complexity of commodity names, and in addition, the problem of irregular filling exists, so that in order to ensure the accuracy and the effectiveness of the subsequent commodity name similarity calculation, commodity name data are required to be cleaned. The method specifically comprises the following steps:
all English letters in commodity names are converted into uppercase, and Chinese symbols are converted into English symbols;
meaningless symbols are removed. Such as space, equal sign, etc.;
the build key is a number, letter, or combination thereof, the value is the map of the trade name, and the trade name with only letters and numbers is corrected. For example, "201" stands for "201 stainless steel", "p.o42.5 stands for" cement p.o42.5"," OPPO "stands for" OPPO cell phone ", etc.
Optionally, after the step of obtaining the main sales commodity list and the step of obtaining the main purchase commodity list, the method further includes:
step S206: judging whether tax payment is abnormal based on the set dictionary, comprising:
converting the character strings of the main commodity name and the main commodity name into a character set;
solving intersection of the character set and a set single word dictionary, and calculating the number of elements of the intersection;
if the number of elements is zero, traversing words in the set multi-word dictionary one by one, checking whether the main commodity name and the main commodity name contain words in the multi-word dictionary, and correcting the set single-word dictionary and the set multi-word dictionary according to a feedback result.
In a specific application scenario, such as chemicals and medicines, since the commodity names of the chemicals and medicines have strong specificity, for example, "2,4, 5-triamino-6-hydroxypyrimidine sulfate", the similarity of commodity names is measured by word segmentation to judge whether the sales entry is abnormal or not, and the effect is often poor. In addition, the production and manufacturing processes of chemicals and medicines relate to very complex chemical reactions which cannot be known by non-professional field personnel, so the embodiment identifies the chemicals and medicines by constructing a dictionary, judges that the business of the commodity in and out is the chemicals or medicines at the same time and meets the screening condition as a normal business, judges that the business of the commodity in and out is the chemicals or medicines only, and judges that the business of the commodity in and out is not the chemicals or medicines as an abnormal business, otherwise, proceeds to the next judgment.
The dictionary constructed in this embodiment is classified into a single-word dictionary and a multi-word dictionary. The single word dictionary and the multi-word dictionary are respectively as follows:
a one word-subject=set ('acid', 'base', 'boron', 'chlorine', 'strontium', 'bromine', 'cyanide', 'selenium', 'nitro', 'sulfo', 'carbon', 'iodine', 'fluorine', 'phosphorus', 'hydrocarbon', 'alkane', 'alkene', 'benzene', 'alcohol', 'ether', 'aldehyde', 'phenol', 'ammonia', 'ketone', 'quinone', 'spiro', 'silicon', 'amine', 'leather', 'ammonium', 'alkyne', 'sulfur', 'helium', 'hydrogen', 'carbon', 'nitrogen', 'oxygen', 'neon', 'sodium', 'magnesium', 'aluminum', 'acyl', 'potassium', 'calcium', 'titanium', 'vanadium', 'scandium', 'chromium', 'manganese', 'cobalt', 'lithium', 'nickel', 'gallium', 'salt', 'mold', 'and a lock,' oxazine ',' estra ',' statin ',' in ',' iron ',' ester ',' glycoside ',' hydroxy ',' carboxy ',' agent ',' pregnant ',' woman ',' enzyme ',' bacterium ',' blood ',' pyri ',' fin ',' peptide ',' naphthalene ',' oxazole ',' butyl ',' hydrazine ',' fin ',' oil ',' element ',' gum ',' plastic ',' film ',' copper ',' zinc ',' fat ',' azorim ',' azoic ',' and, 'pyridine', 'azine', 'tungsten', 'molybdenum', 'poly', 'glass', 'epidemic', 'wax', 'spleen', 'stomach', 'cough', 'angel', 'ginseng', 'stretcher', 'alga', 'spore', 'pain', 'rubber', 'coating', 'mirror', 'ingot', 'ink', 'shoe', 'door', 'window', 'glycoside', 'chess', 'slurry', 'bubble', 'inflammation', 'toxin', 'umbilicus', 'paint', 'ball', 'nylon', 'foil', 'itch', 'bag', 'scar', 'button', 'polyester'
Multiwords-subject=List ("HCPE", "texturing", "Gene", "AK sugar", "alloy", "parent roll", "AOS powder", "polypeptide" P204"," S930/4757"," EPS "," RSS3","7-AVCA "," CPVC "," PVDF "," PVPK30"," FASTCOLORREDRS "," R245"," EAA "," HCFC-141B "," BPI-2009","5-AT "," CAB-35"," ATP "," D4"," PPT30S "," ASA "," CAMP "," DMTDA "," LLDPE "," PP-R "," R142B "," PA66"," SF-1"," TPO "," TPU "," EPE "," HPMC "," TBHQ "," SBS "," UV1173"," VB "," DMEA "," LED "," MBS "," VC "," CELLGARD "," ECDP "," SBR "," AZETIDINE-1-SULFAAMM "," TPE "," PF "," PA "," F-53B "," AEO2"," CFC-113A "," VAE "," PVC "," PCT "," PVDF "," PTMEG "," PTA "," RP-CHOP "," TENOFOVIRALAFENAMIDEFUMARATE "," SAM980IPYS3-10"," PPS "," DLPA "," SCRWF "," TYVEK "," PMMA "," TPU "," PBT "," PPR "," PP "," BHT "," PTFE "," PEEK "," FEP "," EPE "," MTBE "," PE "," PET "," MTBE "," PU "," EVA "," ABS "," BOPP "," TPR "," PVB "," VOC "," CAMP "," HFC-227EA "," R134A "," R22"," B material "," PS sheet material "," sheath material "," pigment "," toner "," masterbatch "," dye "," color paste "," urea "," rosin ", "Mithong", "cold", "bezoar", "rehmannia", "eye drops", "honeysuckle", "poppy", "candles", "pride", "bili", "red iron", "spray head", "analgin", "cinchonine", "gingko", "slow release", "lol", "guaiac", "diazepam", "medical", "antigen", "factor", "salicylic", "Chinese angelica", "kudzuvine root", "flat piece", "prednisone", "cilin", "caltrop", "digoxin", "glucose", "malt", "lactose", "dialysis", "caffeine", "injection", "operation", "ultrasound", "surgical transfusion", "oral liquid", "external liquid", "poria cocos", "nutagrass", "infant", "largehead", "syrup", "cooling oil", "vaseline", "antibody", "ion", "stephania tetrandra", "biology", "granule", "ipettor", "dressing", "dentistry", "denture", "dentistry", "test paper", "eucommia ulmoides", "electrolysis", "socket", "switch", "nylon", "suture material", "polarographic paper", "baikal skullcap root", "Yuanming", "dwarf lilyturf tuber", "mountain road year", "ondansetron", "rifaction", "glycerin", "compound", "forestation", "barbital", "loose", pituitary "," mercuric material scattering "," diation "," water regulation "," filling material "," skeleton "," enteric coating "," enteroscope "," casing "," master batch "," yellow dextrin "," extinction "," dispersion "," permanent fixation "," activity "," chemistry "," leather strip "," woven bag "," casting ", "forging", "bearing", "extract", "left-hand", "cable", "ultra-light clay", "fire retardant", "you Fukang", "Kang Lisu", "throat health powder", "fiber", "rattan", "white oil", "degradation", "condensate", "KT plate", "tyre", "organic plate", "inorganic", "nylon", "bottle cap", "clinic", "dye", "compound", "non-woven fabric", "sponge", "quartz", "fitting", "part", "tap", "Jiali hong", "mask", "color flag", "urea-forming", "transparent", "Mo Minkai", "termi", "alcohol", "limestone", "chemical case", "solder paste", "anticorrosion", "asphalt", "tortoise age", "raw material tape", "acrylic", "refrigerant", "toothbrush", "kathon", "oral", "essence", "kaolin", "gypsum", "X14CRMOS17", "410", "201", "C30", "2CR13", "304/NO.1", "304+/-0.03", "304", "Q345B", "12L14", "DTY", "HOY", "FDY", "FOY", "70#", "45#", "60SI2MNA", "3CR2W8V", "POY", "CVC", "60SI2MNA", "30MNSI", "38CRMOAL", "1CR17", "FT33", "15CRMO", "H62+Q235B", "AG/CU", "40CR", "65MN", "2A12", "42CRMO", "TP316L", "50BV30", "5D/72F", "S30400", "LBKP", "GCR 15").
The specific implementation flow on the base layer of the dictionary constructed as above is as follows:
and converting the character strings of the commodity names of the entries and the commodity names of the sales into character sets. For example, "3, 4-difluorobenzonitrile" is converted to set ('3', '4', '-', 'two', 'fluoro', 'benzene', 'sunny');
the character set of the commodity names of the entries and the sales items is intersected with a single word dictionary, and the element number of the intersection is calculated;
if the number of intersection elements is greater than 0, labeling a commodity name with a label "true" to represent that the commodity is a chemical or a medicine; otherwise, traversing the words in the multi-word dictionary one by one, checking whether the names of the goods in and out of the multi-word dictionary contain the words, if so, marking a label "true" and stopping traversing, otherwise, marking a label "false".
If the labels of the commodity names of the entering and the selling items are true and the screening conditions are met, judging that the enterprise is a normal enterprise; if all the tags of the sales items are true and all the tags of the entry items are false, judging that the enterprise is an abnormal enterprise; otherwise, the next layer of judgment is entered. Since the dictionary is defined based on words that occur frequently in chemicals and medicines, as well as its own characteristics, it may also occur in names of non-chemicals and medicines, it is necessary to set some screening conditions to correct the judgment of the situation that the names of the commodity of the entry and the sale items contain the words in the dictionary, but are not chemicals and medicines in practice. The dictionary may also be continuously revised.
While step S207 and step S206 are applicable to individual enterprises, step S207 is applicable to production, processing or service enterprises, such as production, entry as raw material, and sales as products based on raw material, so that there is a great difference between the names of the two, such as entry commodity is "yarn", sales commodity is "T-shirt". Step S206 is mainly applicable to chemicals, medicines, etc., and the trade names have strong specificity.
The actual operation range of each enterprise is very wide, so that the products related to the commodity of the in-and-out item are numerous, and the analysis of all the commodities is not only low in efficiency, but also has little significance. Therefore, after the data preprocessing is completed, a main commodity list and a main commodity list of each enterprise are required to be determined, and non-main commodity entering and selling items are not analyzed. The specific determination method for each enterprise is as follows:
optionally, obtaining the main sales commodity list or obtaining the main purchase commodity list includes:
acquiring the sum of all the sales commodities and all the entry commodities;
acquiring the summarized amount of each sales commodity and each entry commodity;
dividing the total amount of each commodity of the sale item by the total amount of all commodity of the sale item to obtain a plurality of first proportional values; dividing the total amount of each item of commodity by the total amount of all items of commodity to obtain a plurality of second proportion values;
accumulating the plurality of first proportion values in a sequence from large to small to obtain a first accumulation result, stopping accumulation when the first accumulation result is larger than a fifth set threshold value, and obtaining a list of the sales items corresponding to the first accumulation result larger than the fifth set threshold value as a master sales item list;
and accumulating the plurality of second proportion values in the order from large to small to obtain a second accumulation result, stopping accumulation when the second accumulation result is larger than a sixth set threshold value, and obtaining a list of the incoming commodities corresponding to the second accumulation result larger than the sixth set threshold value as a main shopping commodity list.
In a specific application scene, the amounts of all the sales commodities and all the entry commodities are summarized respectively;
the amount of each sales commodity and each entry commodity are summarized respectively;
dividing the total amount of each sales/offers commodity by the total amount of all sales/offers commodity, and arranging the obtained proportion in a descending order;
and accumulating the proportion by proportion, stopping accumulating once the accumulated value exceeds a set threshold value, and determining the accumulated pin/item commodity as a master pin/item commodity.
In a specific application scenario, as shown in fig. 2, the steps S201, S202, S203, S204, S205, S206, S102, S103 and S207 may be performed in the order of step S201, S202, S203, S204, S205, S206, S207 and S102, S103, or in the order of step S201, S202, S203, S204, S205, S206, S207 and S102, S103, as it is not known in advance which kind of enterprise the enterprise belongs to, so the corresponding order of steps is selected from the point of view of the article itself.
In this embodiment, various forms of dictionaries are constructed, including a key for commodity name correction is a number, a letter, or a combination thereof, a value is a map of commodity names, a chemical and medicine dictionary, and a reasonable entry and sales item commodity entity combination dictionary. These dictionaries are necessary and important for improving the accuracy of whether or not the entry is abnormally judged.
The correlation between business in and out commodity names of enterprises is measured by comprehensively utilizing various means such as whether intersection of the commodity names of the in and out items is empty, judging whether the commodity names of the in and out items are mutually contained, screening the similarity of the character strings of the commodity names of the in and out items, calculating whether the commodity names of the in and out items are semantically similar by utilizing an NLP technology, and the like, and the correlation is used for judging whether abnormal tax-paying behaviors exist.
Reasonable commodity combination of the in-and-out items is sought by utilizing association analysis, and the problem of judging whether the in-and-out items of production or processing enterprises are abnormal is solved.
The commodity names of the entering and the selling items are analyzed by adopting various means such as whether the entering and the selling are carried out or not, setting rules and a custom dictionary, measuring the similarity of the commodity names, carrying out associated analysis and the like, judging whether the entering and the selling items of enterprises are abnormal layer by layer, compared with the method which uses a single method and judges based on commodity codes, the accuracy is higher, the applicability is stronger, the difficult problem of judging whether the entering and the selling items of enterprises in production or processing classes are abnormal is solved, the identification range of the suspicious point tax payer is enlarged, the identification efficiency and the accuracy of the suspicious point tax payer are improved, a monitoring means is provided for national tax collection management and tax risk management, the stealing and leakage behaviors are prevented, and the tax security protection is realized.
The disclosed embodiments provide an electronic device comprising a memory and a processor,
a memory storing executable instructions;
and the processor runs executable instructions in the memory to realize the method for identifying the abnormal tax payment behavior.
The memory is for storing non-transitory computer readable instructions. In particular, the memory may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions. In one embodiment of the present disclosure, the processor is configured to execute the computer readable instructions stored in the memory.
It should be understood by those skilled in the art that, in order to solve the technical problem of how to obtain a good user experience effect, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures are also included in the protection scope of the present disclosure.
The detailed description of the present embodiment may refer to the corresponding description in the foregoing embodiments, and will not be repeated herein.
The embodiment of the disclosure provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the method for identifying abnormal tax payment behaviors.
A computer-readable storage medium according to an embodiment of the present disclosure has stored thereon non-transitory computer-readable instructions. When executed by a processor, perform all or part of the steps of the methods of embodiments of the present disclosure described above.
The computer-readable storage medium described above includes, but is not limited to: optical storage media (e.g., CD-ROM and DVD), magneto-optical storage media (e.g., MO), magnetic storage media (e.g., magnetic tape or removable hard disk), media with built-in rewritable non-volatile memory (e.g., memory card), and media with built-in ROM (e.g., ROM cartridge).
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described.

Claims (8)

1. A method of identifying abnormal tax returns, comprising:
acquiring a main commodity list based on the ratio of the total amount of each commodity in the sales items to the total amount of all commodity in the sales items;
acquiring a main purchase commodity list based on the ratio of the total amount of each item commodity to the total amount of all item commodities;
processing the main commodity names in the main commodity list and the main commodity names in the main commodity list based on a natural language processing technology to obtain a first processing result;
judging whether tax payment behaviors are abnormal or not based on the first processing result;
the step of processing the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list based on the natural language processing technology to obtain a processing result further comprises the following steps before or after:
judging whether the main commodity names in the main commodity list and the main commodity names in the main commodity list are in a combined entity word stock obtained through association analysis or not, and judging whether tax paying behaviors are abnormal or not;
the association analysis in the combined entity word stock obtained by the association analysis comprises the following steps:
dividing the commodity names of the sales items and the commodity names of the entries in the related industry, and extracting entity words;
counting the frequency of each extracted entity word for all the sales items;
counting the frequency of simultaneous occurrence of each pair of entity words, namely the frequency of combined entity words, for each group of sales items and entry items;
comparing the frequency of each entity word with a second set threshold value to obtain a term entity word;
and comparing the frequency of the combined entity words containing the term entity words with a third set threshold value to obtain a combined entity word stock.
2. The method for identifying abnormal tax payment according to claim 1, wherein the processing the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list based on the natural language processing technology to obtain a first processing result includes:
dividing the main commodity name and the main commodity name, and extracting entity words;
acquiring word vectors of the extracted entity words by utilizing the acquired word vector resources;
calculating cosine similarity between entity words based on the word vectors;
for each group of commodities, taking the maximum value of cosine similarity of all entity words as the commodity name similarity of the group of commodities, wherein each group of commodities comprises an entry commodity and a sales commodity;
combining the names of the main sales commodities in the main sales commodity list with the names of the main purchase commodities in the main purchase commodity list to form a plurality of groups of commodities, and calculating the commodity name similarity of each group of commodities;
selecting the maximum commodity name similarity as the commodity similarity of the entry and sales items;
and judging the similarity of the commodity of the sale-entering item and the first set threshold value.
3. The method for identifying abnormal tax returns as in claim 2, wherein in the calculating cosine similarity between entity words based on the word vector, a calculation formula of the cosine similarity is:
Figure QLYQS_1
wherein the method comprises the steps of
Figure QLYQS_2
And->
Figure QLYQS_3
Is a word vector of the entity word, and the I a I and the I b I are vectors respectively>
Figure QLYQS_6
Sum vector->
Figure QLYQS_7
Is a mold of (a).
4. The method for identifying abnormal tax return of claim 2, wherein,
if the commodity similarity of the marketing items is larger than a first set threshold value, the tax payment behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
5. The method for identifying abnormal tax payment according to claim 1, wherein the determining whether the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list are in a combined entity word stock obtained by association analysis, and determining whether the tax payment is abnormal comprises:
if the combined entity word obtained based on the main commodity name in the main commodity list and the main commodity name in the main commodity list appears in the combined entity word stock, the tax paying behavior is considered to be normal;
otherwise, the tax payment behavior is considered abnormal.
6. The method for identifying abnormal tax returns as in claim 1, wherein after the steps of obtaining a listing of the main sales items and obtaining a listing of the main purchase items, further comprising:
judging whether the main sales commodity list or the main purchase commodity list is empty or not;
or/and (or)
Judging whether the purchase and sale items are normal or not based on the number of the commodities in the main sale commodity list or the main purchase commodity list and recorded contents;
or/and (or)
Acquiring an intersection of the main sales commodity set and the main purchase commodity set based on the main sales commodity list and the main purchase commodity list, and judging whether a sales entry is normal or not based on the intersection;
or/and (or)
Judging whether the main sales commodity name in the main sales commodity list and the main purchase commodity name in the main purchase commodity list comprise the same phrase, so as to judge whether the sales entry is normal;
or/and (or)
Judging whether the main commodity name and the main commodity name are character strings which cannot be recognized at all;
if the character strings cannot be identified completely, removing the character strings which cannot be identified in the main sales commodity names and the main purchase commodity names, and reserving the character strings which can be identified;
and acquiring the intersection of the character strings which can be identified in the main sales commodity name and the character strings which can be identified in the main purchase commodity name, acquiring the number m of intersection elements and the maximum length n of 2 character strings, and judging whether the sales entry is normal or not based on the comparison of m/n and a fourth set threshold value.
7. The method for identifying abnormal tax returns as in claim 1, wherein after the steps of obtaining a listing of the main sales items and obtaining a listing of the main purchase items, further comprising:
judging whether tax payment is abnormal based on the set dictionary, comprising:
converting the character strings of the main commodity name and the main commodity name into a character set;
solving intersection of the character set and a set single word dictionary, and calculating the number of elements of the intersection;
if the number of elements is zero, traversing words in the set multi-word dictionary one by one, checking whether the main commodity name and the main commodity name contain words in the multi-word dictionary, and correcting the set single-word dictionary and the set multi-word dictionary according to a feedback result.
8. The method for identifying abnormal tax returns as in claim 1, wherein the step of obtaining a listing of the products of the main sale or obtaining a listing of the products of the main purchase comprises:
acquiring the sum of all the sales commodities and all the entry commodities;
acquiring the summarized amount of each sales commodity and each entry commodity;
dividing the total amount of each commodity of the sale item by the total amount of all commodity of the sale item to obtain a plurality of first proportional values; dividing the total amount of each item of commodity by the total amount of all items of commodity to obtain a plurality of second proportion values;
accumulating the plurality of first proportion values in a sequence from large to small to obtain a first accumulation result, stopping accumulation when the first accumulation result is larger than a fifth set threshold value, and obtaining a list of the sales items corresponding to the first accumulation result larger than the fifth set threshold value as a master sales item list;
and accumulating the plurality of second proportion values in the order from large to small to obtain a second accumulation result, stopping accumulation when the second accumulation result is larger than a sixth set threshold value, and obtaining a list of the incoming commodities corresponding to the second accumulation result larger than the sixth set threshold value as a main shopping commodity list.
CN201911397878.2A 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior Active CN111192128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911397878.2A CN111192128B (en) 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911397878.2A CN111192128B (en) 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior

Publications (2)

Publication Number Publication Date
CN111192128A CN111192128A (en) 2020-05-22
CN111192128B true CN111192128B (en) 2023-06-02

Family

ID=70707814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911397878.2A Active CN111192128B (en) 2019-12-30 2019-12-30 Method for identifying abnormal tax payment behavior

Country Status (1)

Country Link
CN (1) CN111192128B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418652B (en) * 2020-11-19 2024-01-30 税友软件集团股份有限公司 Risk identification method and related device
CN112613929A (en) * 2020-12-17 2021-04-06 山东浪潮商用系统有限公司 Invoice false invoice recognition method and system based on semantic analysis
CN113869802B (en) * 2021-12-01 2022-03-11 神州数码信息系统有限公司 Production enterprise invoice false invoice risk assessment method based on sales entry comparison

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
CN104636341A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Data cleaning storage method for added value tax one-number multi-name monitoring
CN107729937A (en) * 2017-10-12 2018-02-23 北京京东尚科信息技术有限公司 For determining the method and device of user interest label
CN108242020A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for calculating income and selling diversity factor between item item lists
CN110019807A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of commodity classification method and device
CN110046978A (en) * 2019-03-19 2019-07-23 上海大学 Intelligent method of charging out

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5567749B2 (en) * 2012-02-15 2014-08-06 楽天株式会社 Dictionary generating apparatus, dictionary generating method, dictionary generating program, and computer-readable recording medium storing the program
JP6405343B2 (en) * 2016-07-20 2018-10-17 Necパーソナルコンピュータ株式会社 Information processing apparatus, information processing method, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
CN104636341A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Data cleaning storage method for added value tax one-number multi-name monitoring
CN108242020A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for calculating income and selling diversity factor between item item lists
CN107729937A (en) * 2017-10-12 2018-02-23 北京京东尚科信息技术有限公司 For determining the method and device of user interest label
CN110019807A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of commodity classification method and device
CN110046978A (en) * 2019-03-19 2019-07-23 上海大学 Intelligent method of charging out

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯凯 ; 王小华 ; 谌志群 ; .基于动态规划的汉语句子相似度算法.计算机工程.2013,(02),全文. *
秦成磊 ; 魏晓 ; .中文在线评论中的商品特征聚类研究.计算机应用与软件.2016,(07),全文. *

Also Published As

Publication number Publication date
CN111192128A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111192128B (en) Method for identifying abnormal tax payment behavior
US20230368869A1 (en) Systems and methods for visualization of single-cell resolution characteristics
Baesens et al. Data engineering for fraud detection
Maggipinto et al. DTI measurements for Alzheimer’s classification
Hubert et al. High-breakdown robust multivariate methods
Song et al. Low dimensional representation of fisher vectors for microscopy image classification
Punzo Flexible mixture modelling with the polynomial Gaussian cluster-weighted model
CN106662551B (en) Analytical data of mass spectrum processing unit
González-Barrios et al. A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree
Onder et al. Automated labelling of cancer textures in colorectal histopathology slides using quasi-supervised learning
Xu et al. Joint modeling of visual objects and relations for scene graph generation
CN111652319A (en) Cloth defect detection method and device
Chu et al. Learning debiased and disentangled representations for semantic segmentation
CN109800215A (en) Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing
Vega Magdaleno et al. Machine learning-based predictions of dietary restriction associations across ageing-related genes
Wang et al. Feature extraction in the analysis of proteomic mass spectra
Huang et al. Multimodal target detection by sparse coding: Application to paint loss detection in paintings
Chen et al. Fast and explainable clustering based on sorting
CN116523602A (en) Financial product potential user recommendation method for multi-party semi-supervised learning
Melnykov et al. Recent developments in model-based clustering with applications
Normolle et al. Statistical classification of multivariate flow cytometry data analyzed by manual gating: Stem, progenitor, and epithelial marker expression in nonsmall cell lung cancer and normal lung
Shu et al. Fine-grained recognition: Multi-granularity labels and category similarity matrix
CN110033031B (en) Group detection method, device, computing equipment and machine-readable storage medium
Wingert et al. Distinguishing between breaks in the mean and breaks in persistence under long memory
Jankowski et al. Heterogenous committees with competence analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant