CN109426968B - Abnormity detection method and device for main body to be detected and enterprise abnormity detection method - Google Patents

Abnormity detection method and device for main body to be detected and enterprise abnormity detection method Download PDF

Info

Publication number
CN109426968B
CN109426968B CN201710726608.6A CN201710726608A CN109426968B CN 109426968 B CN109426968 B CN 109426968B CN 201710726608 A CN201710726608 A CN 201710726608A CN 109426968 B CN109426968 B CN 109426968B
Authority
CN
China
Prior art keywords
similarity
circulation
circulation object
group
main body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710726608.6A
Other languages
Chinese (zh)
Other versions
CN109426968A (en
Inventor
贺勇
李楠
李屾
张凯
龚坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710726608.6A priority Critical patent/CN109426968B/en
Publication of CN109426968A publication Critical patent/CN109426968A/en
Application granted granted Critical
Publication of CN109426968B publication Critical patent/CN109426968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses an anomaly detection method and device for a main body to be detected and an anomaly detection method for an enterprise. Wherein, the method comprises the following steps: acquiring set similarity between a first circulation object set of a main body to be tested and a second circulation object set of the main body to be tested, wherein the first circulation object in the first circulation object set has inflow attributes, and the second circulation object in the second circulation object set has outflow attributes; and determining whether the main body to be detected is abnormal or not according to the set similarity. The invention solves the technical problem that the detection result is inaccurate because the abnormal state of the enterprise is detected by the manual rule translated by the expert business knowledge in the prior art.

Description

Abnormity detection method and device for main body to be detected and enterprise abnormity detection method
Technical Field
The invention relates to the field of language models, in particular to an anomaly detection method and device for a to-be-detected main body and an anomaly detection method for an enterprise.
Background
Usually, a business will purchase some related commodities in its operation category, i.e. an entry set, and also sell related commodities in its operation category, i.e. a sale set, to the market, and for each business that normally operates a production activity, there will be an entry and sale record. For example: a manufacturing enterprise may purchase raw materials for products it produces and then sell the formed products. If a business is a normal business, then its entry set has a correlation with the set of marketing items. If a business's incoming commodity collection is not related to the sales commodity collection or is less related, the business may have an abnormal, i.e., not operating normally. In the tax work, some enterprises which falsely issue invoices and ticket-changing enterprises use a large amount of special invoices for commodity entry value-added tax with tax deduction and tax evasion and law violation for downstream issuing according to the invoices; in addition, for example, in an export tax refunding enterprise, according to the purchased commodities, the tax rate of the commodities which should be exported is different from the tax rate of the exported commodities declared by the export tax refunding enterprise, so that the refunding tax free illegal activities are cheated. The illegal behaviors of the enterprises bring great influence on national tax receipts, so that the enterprise can be judged whether to be abnormal or not, huge income can be brought to the national tax receipts, illegal behaviors such as tax stealing, tax omission, tax cheating and the like can be attacked, and a good commercial environment can be created.
At present, in the existing scheme, a business expert translates own business knowledge into corresponding manual rules, then circles out some enterprises, and finally judges whether the enterprises are abnormal or not by manually observing the commodities sold in the enterprises. However, the rules cannot be enumerated, some business knowledge which cannot be translated into the rules exists, and in addition, manual case selection and judgment are needed, so once the data volume is large, manual handling cannot be carried out, and a large number of problems of selection omission or wrong selection exist.
Aiming at the problem that the detection result is inaccurate because the abnormal state of an enterprise is detected through a manual rule translated by expert business knowledge in the prior art, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides an anomaly detection method and device for a main body to be detected and an enterprise anomaly detection method, which at least solve the technical problem that in the prior art, the abnormal state of an enterprise is detected through an artificial rule translated from expert business knowledge, so that the detection result is inaccurate.
According to an aspect of the embodiments of the present invention, there is provided a method for detecting an abnormality of a subject to be measured, including: acquiring set similarity between a first circulation object set of a main body to be tested and a second circulation object set of the main body to be tested, wherein the first circulation object in the first circulation object set has inflow attributes, and the second circulation object in the second circulation object set has outflow attributes; and determining whether the main body to be detected is abnormal or not according to the set similarity.
According to another aspect of the embodiments of the present invention, there is also provided an abnormality detection apparatus for a subject to be measured, including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring the set similarity of a first circulation object set of a main body to be detected and a second circulation object set of the main body to be detected, the first circulation object in the first circulation object set has inflow attributes, and the second circulation object in the second circulation object set has outflow attributes; and the determining module is used for determining whether the main body to be detected is abnormal according to the set similarity.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the above method for detecting an abnormality of a subject to be detected.
According to another aspect of the embodiments of the present invention, there is also provided a processor, where the processor is configured to execute a program, where the program executes the above method for detecting an abnormality of a subject to be detected when running.
According to an aspect of the embodiments of the present invention, there is provided an enterprise anomaly detection method, including: acquiring the set similarity of an entry set of an enterprise and an expense set of the enterprise, wherein commodities in the entry set are entries of the enterprise, and commodities in the expense set are expenses of the enterprise; and determining whether the enterprise is abnormal according to the set similarity.
In the embodiment of the invention, the similarity of the enterprise entry set and the sales item set is automatically calculated according to the records of the enterprise entry and the sales item, and abnormal enterprises are quickly found in a large amount of data according to the similarity of the enterprise entry set and the sales item set, so that the detection efficiency and accuracy are improved, and the technical problem that the detection result is inaccurate because the abnormal state of the enterprises is detected through the manual rule translated by expert business knowledge in the prior art is solved.
The scheme does not depend on the type of the enterprise to be detected, does not depend on the professional knowledge of professionals, and has no limit on the data volume of the enterprise to be detected, so that the requirement of the enterprise in detection can be met.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a computer terminal for implementing an abnormality detection method of a subject to be detected according to an embodiment of the present invention;
fig. 2 is a flowchart of an alternative method for detecting an abnormality of a subject to be tested according to an embodiment of the present application;
fig. 3 is a flowchart of an alternative method for detecting an abnormality of a subject to be tested according to an embodiment of the present application;
fig. 4 is a flowchart of an alternative method for detecting an abnormality of a subject to be tested according to an embodiment of the present application;
fig. 5 is a schematic diagram of an abnormality detection apparatus for a subject to be tested according to an embodiment of the present application;
FIG. 6 is a flow chart of a method for anomaly detection for an enterprise according to an embodiment of the present application; and
fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
commodity (service) name string: i.e. the name of the goods or services, for example: laundry detergent, catering service and the like.
Enterprise entry set: i.e., the collection of all goods or services purchased by the business and the amount of money for those goods or services.
Set of marketing items for an enterprise: i.e., the set of all goods or services sold by the enterprise and the amount of money for those goods or services.
The business's set of entries: i.e., the business's entry set and sale set.
word2 vec: is an efficient tool for Google to open a source in 2013 to characterize words as real-valued vectors. It inputs a sequence of words and outputs a real number vector of a specified dimension for each word.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for anomaly detection of a subject under test, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal for implementing an abnormality detection method of a subject to be measured. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the abnormality detection method for the subject to be detected in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the above-mentioned abnormality detection method for the subject to be detected of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
In the above operating environment, the present application provides an abnormality detection method for a subject to be measured as shown in fig. 2. In this embodiment, the main body to be tested is used for representing an enterprise to be tested, the circulation object is used for representing an entry commodity (including a service) and an export commodity (including a service) of the enterprise to be tested, the inflow attribute is used for representing an entry of the enterprise to be tested, and the outflow attribute is used for representing an export of the enterprise to be tested.
Fig. 2 is a flowchart of an abnormality detection method for a subject to be measured according to an embodiment of the present invention. As shown in fig. 2, the method includes:
step S21, acquiring a set similarity between a first flow object set of the subject to be tested and a second flow object set of the subject to be tested, where a first flow object in the first flow object set has an inflow attribute, and a second flow object in the second flow object set has an outflow attribute.
Specifically, the main body to be tested may be an enterprise to be tested, the first circulation object set may be an entry set of the enterprise to be tested, and the second circulation object set may be a sales item set of the enterprise to be tested. The circulation object is a commodity related to the enterprise to be tested. The first circulation object is an entry commodity of the enterprise to be tested, and the second circulation object is a sales commodity of the enterprise to be tested. The set similarity of the enterprise to be tested is used for representing the similarity of the entry combination and the sales item set of the enterprise to be tested.
And step S23, determining whether the subject to be tested is abnormal according to the set similarity.
Specifically, whether the main body to be detected is abnormal or not can be used for representing abnormal conditions of the enterprise to be detected in multiple aspects such as goods feeding, selling, tax payment and invoice issuing, and the similarity of the entry set and the sales set is reduced when abnormal operation occurs in any one aspect of the enterprise to be detected, so that the abnormal conditions are detected.
In an alternative embodiment, a similarity threshold may be set to determine whether the subject under test is abnormal. Since the higher the similarity between the entry set and the sales item set of the enterprise is, the higher the possibility that the enterprise is normal is, the lower the similarity between the proceeding set and the sales item set of the enterprise is, and the higher the possibility that the enterprise is abnormal is, after the similarity threshold is determined, the enterprise of which the similarity between the entry set and the sales item set is smaller than the similarity threshold is considered as an abnormal enterprise.
According to the scheme, the similarity of the enterprise entry set and the sales item set can be automatically calculated according to the records of the enterprise entry and sales items, abnormal enterprises can be quickly found in a large amount of data according to the similarity of the enterprise entry set and the sales item set, and the detection efficiency and accuracy are improved, so that the technical problem that the detection result is inaccurate due to the fact that the abnormal states of the enterprises are detected through manual rules translated from expert business knowledge in the prior art is solved.
The scheme does not depend on the type of the enterprise to be detected, does not depend on the professional knowledge of professionals, and has no limit on the data volume of the enterprise to be detected, so that the requirement of detecting the enterprise can be met.
As an alternative embodiment, in step S21, the obtaining the set similarity between the first circulation object set of the subject to be tested and the second circulation object set of the subject to be tested includes:
s211, acquiring a plurality of circulation object groups of the main body to be detected, wherein each circulation object group comprises: a first flow object and a second flow object.
And S213, acquiring the intra-group similarity of the first circulation object and the second circulation object in each circulation object group.
S215, determining the set similarity of the body to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object.
Specifically, the inflow attribute of the first circulation object is an entry price of an entry commodity, and the outflow attribute of the second circulation object is a sale price of a sale commodity. After the intra-group similarity between the commodities is determined, the similarity between the purchase and sale sets of each enterprise can be measured based on the correlation between the commodities and the purchase and sale amount of the commodities.
As an optional embodiment, the obtaining a plurality of circulation object groups of the subject to be measured includes:
s2113, acquiring a similarity between each first circulation object and each second circulation object.
S2115, respectively searching second circulation objects with the maximum similarity with each first circulation object in the second circulation object set, wherein each searched second circulation object and each corresponding first circulation object form a circulation object group; and
s2117, the first circulation objects with the maximum similarity to each second circulation object are respectively searched in the first circulation object set, and each searched first circulation object and each corresponding second circulation object form a circulation object group.
In an alternative embodiment, G may be an entry set of an enterprise and X may be a sales item set of the enterprise. Constructing G, X pairs, wherein for each commodity p belonging to the set G, q with the largest similarity is found from the set X, and constructing a GX1 { < p, q > } pair set; for each commodity q belonging to the set X, finding the p with the largest similarity from the set G to form GX2 { < p, q > }; and finally, taking a union of GX1 and GX2 to obtain GX. For commercial product p in G. For example, if the calculated result is that b is most similar to a and c is most similar to b, then GX1 { < a, a >, < b, a > }, and GX2 { < a, a >, < b, c > } is obtained by merging GX1 and GX2, and then a set GX { < a, a >, < b, a >, < b, c > } consisting of a plurality of circulation object groups is obtained.
As an alternative embodiment, the intra-group similarity between the first circulation object and the second circulation object in each circulation object group is obtained:
s2131, sorting according to attribute values of a plurality of circulating objects to obtain a sequence of a to-be-tested subject, where the attribute values include: an ingress attribute value and an egress attribute value.
In the above step, an inflow attribute value of the first flow object and an outflow attribute value of the second flow object are obtained, and the flow objects are sorted according to a preset order according to the inflow attribute value of the first flow object and the outflow attribute value of the second flow object.
The purpose of ordering commodities according to the purchase and sale amount is to construct a commodity sequence of enterprises, because the input of word2vec is a sentence sequence, each enterprise is regarded as a sentence, the commodities are regarded as a word, the commodity sequence is regarded as a sentence sequence, and the more similar the money amount of the same enterprise is, the more contextually related the commodities are, so that according to the money amount ordering, the more similar the money amount of the two commodities are, the more contextually related the two commodities are.
In the following, an alternative embodiment is described:
an enterprise A: 100-element commodities a and b are purchased yesterday; 80 yuan commodity a is sold; today 50-dollar goods a, 50-dollar goods c are purchased, and 50-dollar goods b are sold.
And an enterprise B: yesterday purchased 100-yuan commodity a and 100-yuan commodity d; 50 yuan commodity a is sold; today 50-yuan items a, b and e are purchased and 50-yuan items e are sold.
The purchased (in) sales (sales) items of each business during this period of time (i.e. yesterday and today) are first aggregated:
the entry set G of the enterprise A is { a:150, b:100, c:50}, and the item set X is { a:80, b:50 };
the entry set G of the enterprise B is { a:150, B:50, d:50}, and the entry set X is { a:50, e:50 };
then, the input commodities are sorted according to the input amount, the sales commodities are sorted according to the sales amount, the same commodity is in the input item set G and also appears in the sales item set X, then the sorting of the enterprise appears twice, the sorting still appears according to the sequence of the input amount and the sales amount, and if the amounts of the two commodities are the same, the purchased commodity is arranged in front of the sales commodity.
Thus, the product sequence of enterprise a in this time period can be obtained as follows: abacb; the commodity sequence of business B during this time period is: abdac.
And S2133, obtaining vectors corresponding to the circulation objects respectively according to the sequence of the main body to be detected through a preset language analysis model.
Specifically, the language analysis model may be word2vec, and the word2vec tool represents each commodity by an n-dimensional real number vector. The original word2vec is used to process natural language, the input is a natural language sentence sequence, then each word in the sentence is characterized by using an n-dimensional vector, and the similarity between words can be measured by using the similarity between the vectors of the words. Therefore, in order to characterize each commodity by using an n-dimensional vector, each commodity may be used as a word, the sequence of commodity sold and sold for each enterprise is used as a sentence, the sequence of enterprises obtained in step S2131 is input into word2vec, and word2vec generates and outputs a corresponding vector for each word in the input sequence of words, thereby obtaining a vector corresponding to each commodity.
S2135, determining intra-group similarity of each circulation object group according to the first vector corresponding to the first circulation object and the second vector corresponding to the second circulation object in each circulation object group.
The inflow attribute value of the first circulation object is the amount of an item to be entered of the commodity, and the outflow attribute value of the second circulation object is the amount of an item to be sold of the commodity. And then determining set similarity according to the similarity in the group, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object, introducing the association relationship among elements (namely circulation objects) in the process of acquiring the set similarity, and introducing the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object as the weights of the elements in the process of acquiring the similarity in the group.
As an alternative embodiment, determining the intra-group similarity of each circulation object group according to the vector corresponding to the first circulation object and the vector corresponding to the second circulation object in each circulation object group includes:
s21351, obtaining an ith first product through the ith value in the first vector and the ith value in the second vector, and accumulating the n first products to obtain a first accumulated value, wherein i is greater than or equal to 1 and is less than or equal to n, and the dimension of the vector is obtained.
S21353, obtain second products by squaring the ith value in the first vector, and accumulate the n second products to obtain second accumulated values.
S21355, obtaining a third product by squaring the ith value in the second vector, and accumulating the n third products to obtain a third accumulated value.
S21357, dividing the first accumulated value by a multiplication result to obtain an intra-group similarity, where the multiplication result is a product of the second accumulated value squared by two times and the third accumulated value squared by two times.
In an alternative embodiment, the intra-group similarity of the circulation object group may be calculated by the following formula:
Figure BDA0001386163120000091
wherein p and q are used to represent a first vector and a second vector, cos (p, q) is used to represent the intra-group similarity of p and q, n is used to represent the dimensions of the first vector and the second vector (the dimensions of the first vector and the second vector are the same),
Figure BDA0001386163120000094
for representing the ith value in the first vector,
Figure BDA0001386163120000095
for representing the ith value, 0, in the second vector<i≤n。
The similarity between the commodities is determined through cosine similarity, the range is [ -1,1], the larger the value is, the more relevant the two commodities are, and the relevance of the commodities is 1.
In an alternative embodiment, the vector output by the product p through word2vec is v p =[0.1,0.2,0.0,-0.1,0.25,0.45,0.1]The vector output by the commodity q through word2vec is v q =[0.5,0.4,0.1,0.0,0.4,0.2,0.0]In this example, n is 6,
Figure BDA0001386163120000092
Figure BDA0001386163120000093
as an optional embodiment, after determining the intra-group similarity in each circulation object group, the method further includes:
in step S25, an intra-group similarity threshold is set. Specifically, the intra-group similarity threshold may be set to 0.2.
In step S27, when the intra-group similarity is smaller than the intra-group similarity threshold, the intra-group similarity between the circulation objects is ignored.
In the above step, when the intra-group similarity is smaller than the intra-group similarity threshold, the intra-group similarity between the circulation objects may be ignored, and the intra-group similarity between the circulation objects may be set to 0, that is, it is considered that there is no similarity between the circulation objects.
According to the scheme, the similarity between the commodities is ignored under the condition that the similarity between the two commodities is too low, so that the complexity of calculation is reduced, and the calculation efficiency is improved.
In an alternative embodiment, still in the above-mentioned embodiment, the intra-group similarity threshold value is set to 0.2, and the similarity between the commodity p and the commodity q is set to 0.7022, which is greater than the intra-group similarity threshold value. The above steps can also be expressed by the following formula:
Figure BDA0001386163120000101
in the above formula, sim (p, q) is a newly defined intra-group similarity, the intra-group similarity threshold is set to 0.2, and in the case where the intra-group similarity between two commodities is less than 0.2, the two commodities are considered to be unrelated.
As an alternative embodiment, in step S215, determining the set similarity of the subject to be measured according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object, and the outflow attribute value of the second circulation object, includes:
step S2151, compares the magnitudes of the inflow attribute value of the first flow object and the outflow attribute value of the second flow object in each group of flow objects, and obtains the maximum value and the minimum value of each group.
Step S2153, a fourth product is obtained by the intra-group similarity and the minimum value of the circulation object group.
Step S2155, accumulates the fourth products of the plurality of circulation object groups to obtain a fourth accumulated value, and accumulates the maximum value of the plurality of circulation object groups to obtain a fifth accumulated value.
Step S2157, determining the ratio of the fourth accumulated value to the fifth accumulated value as the set similarity of the subject to be tested.
Specifically, the similarity between the first circulation object set and the second circulation object set can be calculated by the following formula:
Figure BDA0001386163120000102
where sim (G, X) is used to characterize the set similarity of the first set of flow through objects and the second set of flow through objects, and GX is used to characterize the set of groups of flow through objects, je p For an inflow attribute value representing p, je q For representing the outflow attribute value of q. Specifically, sim (p, q) × min { je p ,je q Is the fourth product, Σ <p,q>∈GX sim(p,q)*min{je p ,je q Is the fourth accumulated value, sigma <p,q>∈GX max{je p ,je q It is the fifth accumulated value.
The above formula is an improvement of the jbcard similarity calculation formula, the jbcard similarity calculation formula uses the number of intersection elements of the two sets to divide the number of union elements when calculating the similarity of the two sets, does not include the correlation between the two elements and the weight of the elements in the calculation formula, the above formula includes the correlation between the two elements (corresponding to the intra-group similarity between the commodities) and the weight of the elements (corresponding to the amount of the commodity) in the calculation formula when calculating the similarity of the sets, min represents the smaller amount between the purchase amount of the commodity p in the entry and the sale amount of the commodity q in the sale item, and max is the larger amount between the two amounts. Thus, the set similarity among the commodity sales sets of each enterprise is obtained, and whether the enterprise is abnormal or not is judged by using the set similarity.
Therefore, the formula not only realizes the calculation of the set similarity, but also introduces the intra-group similarity and the sum of the circulating objects in the calculation process, thereby improving the accuracy of calculation and being more suitable for the calculation of the set similarity of the enterprise marketing items.
As an alternative embodiment, the first circulation object and the second circulation object satisfy any one or more of the following conditions:
the position of the inflow property value in the inflow property values of all first flow-through objects is higher than a first preset position, or the position of the outflow property value in the outflow property values of all second flow-through objects is higher than a second preset position. The preset conditions are used for limiting the commodities as follows: for commodities in the entry set, the purchase amount of the commodities is in the first n items of the sum of all the entries, wherein n is a preset first position; for the commodities in the sale item set, the sale amount of the commodities should be m items before sale of the commodities, m is a preset second position, and m and n can be equal.
The inflow attribute value and the outflow attribute value are both greater than a preset attribute value. The preset attribute value is a preset money amount, and the preset condition is used for limiting the commodity as follows: the purchase amount and the sale amount of the commodity are both higher than the preset amount.
The proportion of the additional attribute value to the inflow attribute value of the first flow-through object set exceeds a first preset proportion, or the proportion of the additional attribute value to the outflow attribute value of the second flow-through object set exceeds a second preset proportion. The additional attribute value may be a reserved amount of the commodity, and the preset condition is used to limit the commodity as follows: for commodities in the entry set, the proportion of the reserved amount to the total amount of the entries exceeds a first preset proportion; for the commodities in the sales item set, the proportion of the reserved amount to the total amount of the sales items exceeds a second preset proportion.
As an optional embodiment, determining whether the subject to be tested is abnormal according to the set similarity includes:
step S231, comparing the set similarity with a preset similarity threshold. The second similarity threshold may be adjusted according to the type of the enterprise to be detected.
Step S233, determining that the subject to be measured is abnormal when the set similarity is smaller than the preset similarity threshold.
As an optional embodiment, after determining whether the subject to be tested is abnormal according to the set similarity, the method further includes: and if the to-be-detected main body is determined to be abnormal, generating alarm information.
As an alternative embodiment, the method further includes:
in step S291, a record of the receipt of the circulation object is acquired.
Step S293, if the circulation objects in the record satisfy the preset condition, adding the received circulation objects to the circulation objects for determining the set similarity of the subject to be measured, and re-determining the set similarity of the subject to be measured.
Step S295, if the to-be-measured subject in the record does not satisfy the preset condition, ignoring the circulation object in the record.
On the basis of performing initial off-line calculation to obtain an initial commodity library, the steps can be taken as online detection steps, namely, the online detection is directly performed by receiving new data. For each newly received purchase and sale record of a business, if the sum of the record is less than t1 or the goods of the record are not in the goods library, the record is not processed.
As an optional embodiment, the method further includes:
in step S297, the received circulation object is acquired in a preset cycle.
Step S299, adding the received circulation object to the circulation object for determining the set similarity of the subject to be measured, and re-determining the set similarity of the subject to be measured.
In an optional embodiment, for the entry record, if the entry set of the enterprise does not contain the commodity, calculating the most relevant entry commodity of the entry commodity through the first similarity, obtaining an entry and sale pair of the commodity, adding the entry and sale pair of the enterprise, and if the entry set of the enterprise contains the commodity, updating the entry amount of the entry and sale pair taking the commodity as the entry commodity; and for the sales item record, if the sales item set of the enterprise does not contain the commodity, calculating the most relevant sales item of the sales item commodity through the first similarity to obtain a sales pair of the sales item commodity, adding the sales pair into the sales pair set of the enterprise, and if the sales item set of the enterprise contains the commodity, updating the sales item amount of the sales pair taking the commodity as the sales item commodity.
On the basis of performing initial off-line calculation to obtain an initial commodity library, the steps can be taken as off-line updating steps, namely, when new data are received, on-line detection is not directly performed, all the data received in the period time are acquired according to a preset period, then detection is performed, and the previous detection result is updated by adopting a new detection result.
When new data has accumulated for a period of time (e.g., one month), the vector of the good needs to be updated. Therefore, new data and old history data are integrated, namely a commodity sequence of each enterprise is constructed for all the data accumulated at present, and then an n-dimensional vector of each commodity is obtained by using word2vec training and is used as a new commodity library and stored. When the data in the commodity library is updated more and more continuously, the hit rate in online detection can be improved, that is, when a new circulation object is received in online monitoring, if the quantity in the commodity library is large, the possibility that the newly received circulation object exists in the commodity library is also high. Therefore, iteration is carried out according to initial off-line calculation, on-line updating and detection, off-line updating, on-line updating and detection and off-line updating, so that the commodity library is enriched continuously.
Fig. 3 is a flowchart of an optional method for detecting an abnormality of a subject to be tested according to an embodiment of the present application, and with reference to fig. 3, the method for detecting an abnormality of a subject to be tested includes the steps of:
step S31, initial off-line calculation.
The initial off-line calculation may be performed by any one of the methods described above in embodiment 1, for example:
firstly, training n-dimensional vectors of each commodity by using all historical data to obtain a commodity library with commodity vectors.
In the second step, using the data of a period of time (for example, the latest three months till now), the marketing pair set GX of each enterprise is obtained, and the sim (p, q) value of each marketing pair commodity of the enterprise is calculated by combining the commodity vectors in the first step, and is used as the intra-group similarity between the marketing commodities.
And thirdly, obtaining a purchase and sale rationality score (corresponding to the set similarity) of each enterprise according to a sim (G, X) calculation formula, and if the score is smaller than a threshold t, determining that the score is abnormal.
And step four, verifying the result of the step three, and selecting a better threshold value t for the next calculation.
And step S33, online detection and update.
On the basis of obtaining the initial commodity library by performing initial off-line calculation, the steps can be taken as the steps of on-line detection, namely, the new data is received to directly perform on-line detection. For each incoming purchase and sale record of a business, if the sum of the record is less than t1 or the commodity of the record is not in the commodity library, the record is not processed, and otherwise: for the entry record, if the entry set of the entry and sale set of the enterprise does not contain the commodity, calculating the most relevant sale item commodity of the entry commodity through the intra-group similarity to obtain an entry and sale pair of the commodity, adding the entry and sale pair of the enterprise, and if the entry set of the entry and sale set of the enterprise contains the commodity, updating the entry amount of the entry and sale pair taking the commodity as the entry commodity; and for the sales item record, if the sales item set of the sales set of the enterprise does not contain the commodity, calculating the most relevant sales item commodity of the sales item commodity through the intra-group similarity to obtain a sales item pair of the sales item commodity, adding the sales item pair into the sales item set of the enterprise, and if the sales item set of the sales set of the enterprise contains the commodity, updating the sales item amount of the sales item pair taking the commodity as the sales item commodity.
Step S35, updating offline.
On the basis of carrying out initial off-line calculation to obtain an initial commodity library or carrying out on-line detection and updating on the initial commodity library, the steps can be taken as a step of off-line updating, namely, when new data are received, on-line detection is not directly carried out, all the data received in the period time are obtained according to a preset period, then detection is carried out, and the previous detection result is updated by adopting a new detection result.
Fig. 4 is a flowchart of an optional abnormality detection method for a to-be-detected subject according to an embodiment of the present application, and the following further describes the abnormality detection method for the to-be-detected subject with reference to the flowchart shown in fig. 4, where the aggregation method may be used to perform the initial offline calculation to construct the commodity library.
And step S41, aggregating to obtain an entry set G and a sale set X of each enterprise.
In the above steps, with the enterprise as the object, the incoming commodity set and the sales commodity set of each enterprise are aggregated, and each commodity is provided with its amount, if the commodity is in the incoming set, the amount is the total amount of the commodity purchased by the enterprise. If the goods in the sale item set are the goods, the sum of the goods is the total sum of the goods sold by the enterprise. Then, the commodities of topk in descending order of money amount in each set are reserved, or the commodities with the money amount not less than a given threshold value t1 in each enterprise are reserved, or the commodity of the entry with the money amount of the top percentage t2 of the total money amount of the entry and the commodity of the sale with the money amount of the top percentage t2 of the total money amount of the sale are reserved.
And step S43, fusing the enterprise purchase and sale sets, and sequencing and screening according to the money amount to construct a commodity sequence.
In the above steps, the commodities sold and sold for each enterprise are fused together, and sorted in descending order according to the amount, if the commodities are commodities, the amount used for sorting is the amount purchased by the commodities; if the commodity is sold, the amount used by the sorting is the amount sold by the commodity; if they occur in the in-pin set at the same time, they are all retained in the ordering. Then, each commodity is regarded as a word, and a commodity sequence is constructed in the ordered order.
Step S45, training a plurality of commodity sequences through word2vec to obtain the vector of each commodity.
At step S47, a pin pair is constructed for each business.
The pin pairs are the circulation object groups, and can be constructed in the following way: and searching the second circulation object with the maximum similarity to the first circulation object in the second circulation object to obtain a plurality of advance-sale pairs, and searching the first circulation object with the maximum similarity to the second circulation object in the first circulation object to obtain a plurality of advance-sale pairs.
Step S49, calculate the set similarity of the marketing set for each enterprise.
Specifically, the set similarity of the marketing set of the enterprise may be calculated through the calculation formula in step S231.
And S411, carrying out abnormity judgment on the enterprise according to the set similarity of the sales and sales sets of the enterprise.
The set similarity is compared with a set similarity threshold, if the set similarity of the enterprise marketing set is higher than the set similarity threshold, the enterprise is normal, and if the set similarity of the enterprise marketing set is lower than the set similarity threshold, the enterprise is determined to be abnormal.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is also provided an abnormality detection apparatus for a subject to be tested, the apparatus being configured to implement the abnormality detection method for a subject to be tested, and fig. 5 is a schematic diagram of the abnormality detection apparatus for a subject to be tested according to the embodiment of the present application, and as shown in fig. 5, the apparatus includes:
the obtaining module 50 is configured to obtain a set similarity between a first circulation object set of the subject to be tested and a second circulation object set of the subject to be tested with an outflow attribute, where a first circulation object in the first circulation object set with the outflow attribute has an inflow attribute, and a second circulation object in the second circulation object set with the outflow attribute has an outflow attribute.
And the determining module 52 is configured to determine whether the body to be tested for the outflow attribute is abnormal according to the outflow attribute set similarity.
As an alternative embodiment, the obtaining module includes:
the first acquisition submodule is used for acquiring a plurality of circulation object groups of the main body to be detected, and each circulation object group comprises: a first flow object and a second flow object.
And the second acquisition sub-module is used for acquiring the intra-group similarity of the first circulation object and the second circulation object in each circulation object group.
And the determining submodule is used for determining the set similarity of the body to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object.
As an alternative embodiment, the first obtaining sub-module includes:
and the acquisition unit is used for acquiring the similarity between each first circulation object and each second circulation object.
The first searching unit is used for respectively searching a second circulation object with the maximum similarity to each first circulation object in the second circulation object set, and each searched second circulation object and each corresponding first circulation object form a circulation object group; and
and the second searching unit is used for respectively searching the first circulation object with the maximum similarity with each second circulation object in the first circulation object set, and each searched first circulation object and each corresponding second circulation object form a circulation object group.
As an alternative embodiment, the obtaining unit includes:
and the sequencing subunit is used for sequencing according to the attribute values of the plurality of circulation objects to obtain a sequence of the main body to be tested, wherein the attribute values comprise: an ingress attribute value and an egress attribute value;
the analysis subunit is used for obtaining vectors corresponding to the plurality of circulation objects respectively according to the sequence of the main body to be detected through a preset language analysis model;
and the determining subunit is used for determining the intra-group similarity of each circulation object group according to the first vector corresponding to the first circulation object and the second vector corresponding to the second circulation object in each circulation object group.
As an alternative embodiment, the determining the sub-unit comprises:
and the first calculating subunit is used for obtaining an ith first product through an ith value in the first vector and an ith value in the second vector, and accumulating the n first products to obtain a first accumulated value, wherein i is greater than or equal to 1 and is less than or equal to n, and the i is the dimension of the vector.
And the second calculation subunit is used for obtaining second products by the square of the ith value in the first vector and accumulating the n second products to obtain a second accumulated value.
And the third calculation subunit is used for obtaining a third product through the square of the ith value in the second vector, and accumulating the n third products to obtain a third accumulated value.
And the fourth calculation subunit is used for dividing the first accumulated value by a multiplication result to obtain the intra-group similarity, wherein the multiplication result is the product of the second accumulated value subjected to quadratic opening and square division and the third accumulated value subjected to quadratic opening and square division.
As an alternative embodiment, after determining the intra-group similarity in each circulation object group of the outflow attribute, the apparatus further includes:
and the setting module is used for setting the intra-group similarity threshold.
And the ignoring module is used for ignoring the intra-group similarity between the circulation objects under the condition that the intra-group similarity is smaller than the intra-group similarity threshold.
As an alternative embodiment, the determining sub-module includes:
and the fifth calculating subunit is used for comparing the inflow attribute value of the first circulation object with the outflow attribute value of the second circulation object in each group of circulation object groups to obtain the maximum value and the minimum value of each group.
And the sixth calculating subunit is used for obtaining a fourth product through the intra-group similarity and the minimum value of the circulation object group.
And the seventh calculation subunit is used for accumulating the fourth products of the plurality of circulation object groups to obtain a fourth accumulated value and accumulating the maximum value of the plurality of circulation object groups to obtain a fifth accumulated value.
And the eighth calculating subunit is used for determining the ratio of the fourth accumulated value to the fifth accumulated value as the set similarity of the main body to be detected.
As an alternative embodiment, the outflow attribute first circulation object and the outflow attribute second circulation object satisfy any one or more of the following conditions:
the positions of the inflow attribute values in the inflow attribute values of all the first circulation objects are higher than the first preset positions, or the positions of the outflow attribute values in the outflow attribute values of all the second circulation objects are higher than the second preset positions;
the inflow attribute value and the outflow attribute value are both larger than a preset attribute value;
the proportion of the additional attribute value to the inflow attribute value of the first flow-through object set exceeds a first preset proportion, or the proportion of the additional attribute value to the outflow attribute value of the second flow-through object set exceeds a second preset proportion.
As an alternative embodiment, the determining module includes:
the comparison submodule is used for comparing the set similarity with a preset similarity threshold;
and the abnormity determining submodule is used for determining the abnormity of the main body to be detected under the condition that the set similarity is smaller than a preset similarity threshold.
As an optional embodiment, after determining whether the outflow attribute subject to be measured is abnormal according to the outflow attribute set similarity, the apparatus further includes:
and the alarm module is used for generating alarm information if the abnormality of the main body to be detected is determined.
As an alternative embodiment, the apparatus further comprises:
a record acquisition module for acquiring a record of the received circulation object;
the first re-determining module is used for adding the received circulating object into the circulating object for determining the set similarity of the main body to be detected and re-determining the set similarity of the main body to be detected if the circulating object in the record meets the preset condition;
and the record ignoring module is used for ignoring the circulating object in the record if the main body to be detected in the record does not meet the preset condition.
As an optional embodiment, the apparatus further comprises:
the receiving module is used for acquiring the received circulating object according to a preset period;
and the second re-determining module is used for adding the received circulating object into the circulating object for determining the set similarity of the main body to be detected and re-determining the set similarity of the main body to be detected.
Example 3
According to an embodiment of the present invention, an embodiment of an enterprise anomaly detection method is further provided, and fig. 6 is a flowchart of an enterprise anomaly detection method according to an embodiment of the present application, which is shown in fig. 6, and includes:
and step S61, acquiring the set similarity of the entry set of the enterprise and the sale set of the enterprise, wherein the commodities in the entry set are entries of the enterprise, and the commodities in the sale set are sales of the enterprise.
And step S63, determining whether the enterprise is abnormal according to the set similarity.
The scheme in the embodiment corresponds to the scheme in embodiment 1, the enterprise is a subject to be tested, the sales item set of the enterprise is a first circulation object combination, and the sales item set is a second circulation object set.
In the embodiment of the invention, the similarity of the enterprise entry set and the sales item set is automatically calculated according to the records of the enterprise entry and the sales item, and abnormal enterprises are quickly searched in a large amount of data according to the similarity of the enterprise entry set and the sales item set, so that the detection efficiency and accuracy are improved, and the technical problem that the detection result is inaccurate because the abnormal state of the enterprises is detected through the manual rule translated from expert business knowledge in the prior art is solved
Example 4
The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute the program code of the following steps in the method for detecting an abnormality of a subject to be tested of an application program: acquiring set similarity of a first circulation object set of a body to be tested and a second circulation object set of the body to be tested, wherein a first circulation object in the first circulation object set has an inflow attribute, and a second circulation object in the second circulation object set has an outflow attribute; and determining whether the main body to be detected is abnormal or not according to the set similarity.
Alternatively, fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 7, the computer terminal 700 may include: one or more processors 702 (only one of which is shown), a memory 704, and a transmission device 706.
The memory may be configured to store a software program and a module, such as a program instruction/module corresponding to the method and apparatus for detecting an abnormality of a subject to be detected in the embodiment of the present invention, and the processor executes various functional applications and data processing by running the software program and the module stored in the memory, that is, the method for detecting an abnormality of a subject to be detected is implemented. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring set similarity between a first circulation object set of a main body to be tested and a second circulation object set of the main body to be tested, wherein the first circulation object in the first circulation object set has inflow attributes, and the second circulation object in the second circulation object set has outflow attributes; and determining whether the main body to be detected is abnormal or not according to the set similarity.
Optionally, the processor may further execute the program code of the following steps: obtaining a plurality of circulation object groups of a main body to be detected, wherein each circulation object group comprises: a first flow object and a second flow object; acquiring the intra-group similarity of a first circulation object and a second circulation object in each circulation object group; and determining the set similarity of the body to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object.
Optionally, the processor may further execute the program code of the following steps: acquiring the similarity between each first circulation object and each second circulation object; searching a second circulation object with the maximum similarity to each first circulation object in the second circulation object set respectively, wherein each searched second circulation object and each corresponding first circulation object form a circulation object group respectively; and respectively searching the first circulation object with the maximum similarity with each second circulation object in the first circulation object set, wherein each searched first circulation object and each corresponding second circulation object form a circulation object group.
Optionally, the processor may further execute the program code of the following steps: sequencing according to the attribute values of the circulating objects to obtain a sequence of a main body to be tested, wherein the attribute values comprise: an ingress attribute value and an egress attribute value; obtaining vectors corresponding to a plurality of circulation objects respectively according to the sequence of a main body to be detected through a preset language analysis model; and determining the intra-group similarity of each circulation object group according to the first vector corresponding to the first circulation object and the second vector corresponding to the second circulation object in each circulation object group.
Optionally, the processor may further execute the program code of the following steps: obtaining an ith first product through an ith value in the first vector and an ith value in the second vector, and accumulating the n first products to obtain a first accumulated value, wherein i is more than or equal to 1 and less than or equal to n is the dimension of the vector; obtaining second products through the square of the ith value in the first vector, and accumulating the n second products to obtain a second accumulated value; obtaining a third product through the square of the ith value in the second vector, and accumulating the n third products to obtain a third accumulated value; and dividing the first accumulated value by a multiplication result to obtain the intra-group similarity, wherein the multiplication result is the product of the second accumulated value after twice opening and squaring and the third accumulated value after twice opening and squaring.
Optionally, the processor may further execute the program code of the following steps: setting a similarity threshold value in the group; in the event that the intra-group similarity is less than the intra-group similarity threshold, the intra-group similarity between the circulating objects is ignored.
Optionally, the processor may further execute the program code of the following steps: comparing the inflow attribute value of the first circulation object with the outflow attribute value of the second circulation object in each group of circulation object groups to obtain the maximum value and the minimum value of each group; obtaining a fourth product through the intra-group similarity and the minimum value of the circulation object group; accumulating the fourth products of the plurality of circulation object groups to obtain a fourth accumulated value, and accumulating the maximum value of the plurality of circulation object groups to obtain a fifth accumulated value; and determining the ratio of the fourth accumulated value to the fifth accumulated value as the set similarity of the main body to be detected.
Optionally, the processor may further execute the program code of the following steps: the position of the inflow attribute value in the inflow attribute values of all the first circulation objects is higher than a first preset position, or the position of the outflow attribute value in the outflow attribute values of all the second circulation objects is higher than a second preset position; the inflow attribute value and the outflow attribute value are both larger than a preset attribute value; the proportion of the additional attribute value to the inflow attribute value of the first flow-through object set exceeds a first preset proportion, or the proportion of the additional attribute value to the outflow attribute value of the second flow-through object set exceeds a second preset proportion.
Optionally, the processor may further execute the program code of the following steps: comparing the set similarity with a preset similarity threshold; and determining that the main body to be tested is abnormal under the condition that the set similarity is smaller than a preset similarity threshold.
Optionally, the processor may further execute the program code of the following steps: and if the to-be-detected main body is determined to be abnormal, generating alarm information.
Optionally, the processor may further execute the program code of the following steps: acquiring a record of a received circulation object; if the circulation objects in the record meet the preset conditions, adding the received circulation objects into the circulation objects for determining the set similarity of the main body to be tested, and re-determining the set similarity of the main body to be tested; and if the main body to be detected in the record does not meet the preset condition, ignoring the circulating object in the record.
Optionally, the processor may further execute the program code of the following steps: if the received circulation object is obtained according to the preset period; and adding the received circulating object into the circulating object for determining the set similarity of the main body to be detected, and re-determining the set similarity of the main body to be detected.
By adopting the embodiment of the invention, the similarity of the enterprise entry set and the sales item set is automatically calculated according to the records of the enterprise entry and the sales item, and the abnormal enterprise is quickly searched in a large amount of data according to the similarity of the enterprise entry set and the sales item set, so that the detection efficiency and accuracy are improved, and the technical problem that the detection result is inaccurate because the abnormal state of the enterprise is detected through the manual rule translated by expert business knowledge in the prior art is solved.
The scheme does not depend on the type of the enterprise to be detected, does not depend on the professional knowledge of professionals, and has no limit on the data volume of the enterprise to be detected, so that the requirement of detecting the enterprise can be met.
It should be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 7 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 5
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the abnormality detection method for a to-be-detected subject provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring set similarity between a first circulation object set of a to-be-detected main body and a second circulation object set of the to-be-detected main body, wherein a first circulation object in the first circulation object set has inflow attributes, and a second circulation object in the second circulation object set has outflow attributes; and determining whether the main body to be detected is abnormal or not according to the set similarity.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and amendments can be made without departing from the principle of the present invention, and these modifications and amendments should also be considered as the protection scope of the present invention.

Claims (14)

1. A method for detecting an abnormality of a subject to be measured, comprising:
acquiring set similarity between a first circulation object set of a to-be-detected main body and a second circulation object set of the to-be-detected main body, wherein a first circulation object in the first circulation object set has inflow attributes, and a second circulation object in the second circulation object set has outflow attributes;
determining whether the main body to be tested is abnormal or not according to the set similarity;
wherein, the main part that awaits measuring is the enterprise that awaits measuring, according to set similarity confirms whether the main part that awaits measuring is unusual, include: comparing the set similarity with a preset similarity threshold; determining that the main body to be tested is abnormal under the condition that the set similarity is smaller than the preset similarity threshold;
acquiring the set similarity between a first circulation object set of a main body to be tested and a second circulation object set of the main body to be tested, wherein the acquiring comprises the following steps: acquiring a plurality of circulation object groups of the main body to be detected, wherein each circulation object group comprises: a first flow object and a second flow object; acquiring the intra-group similarity of the first circulation object and the second circulation object in each circulation object group; and determining the set similarity of the body to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object.
2. The method of claim 1, wherein obtaining a plurality of sets of flow through objects for the subject comprises:
acquiring the similarity between each first circulation object and each second circulation object;
searching a second circulation object with the maximum similarity to each first circulation object in the second circulation object set respectively, wherein each searched second circulation object and each corresponding first circulation object form the circulation object group respectively; and
and searching a first circulation object with the maximum similarity to each second circulation object in the first circulation object set respectively, wherein each searched first circulation object and each corresponding second circulation object form the circulation object group respectively.
3. The method of claim 2, wherein obtaining intra-group similarity of the first flow through object and the second flow through object in each of the sets of flow through objects comprises:
sequencing according to the attribute values of the circulating objects to obtain the sequence of the main body to be tested, wherein the attribute values comprise: the ingress attribute value and the egress attribute value;
obtaining vectors corresponding to the plurality of circulation objects respectively according to the sequence of the main body to be detected through a preset language analysis model;
and determining the intra-group similarity of each circulation object group according to a first vector corresponding to the first circulation object and a second vector corresponding to the second circulation object in each circulation object group.
4. The method of claim 3, wherein determining the intra-group similarity of each of the groups of flow through objects based on the vector corresponding to the first flow through object and the vector corresponding to the second flow through object in each of the groups of flow through objects comprises:
obtaining an ith first product through an ith value in the first vector and an ith value in the second vector, and accumulating n first products to obtain a first accumulated value, wherein i is more than or equal to 1 and is less than or equal to n, and the dimension of the vector is obtained;
obtaining a second product by the square of the ith value in the first vector, and accumulating the n second products to obtain a second accumulated value;
obtaining a third product by the square of the ith value in the second vector, and accumulating the n third products to obtain a third accumulated value;
and dividing the first accumulated value by a multiplication result to obtain the intra-group similarity, wherein the multiplication result is the product of a second accumulated value subjected to quadratic opening and square division and a third accumulated value subjected to quadratic opening and square division.
5. The method of claim 1, wherein after determining the intra-group similarity in each of the groups of flow through objects, the method further comprises:
setting a similarity threshold value in a group;
ignoring intra-group similarities between the currency objects if the intra-group similarity is less than the intra-group similarity threshold.
6. The method of claim 1, wherein determining the set similarity of the subject to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object, and the outflow attribute value of the second circulation object comprises:
comparing the inflow attribute value of the first circulation object with the outflow attribute value of the second circulation object in each group of circulation object groups to obtain the maximum value and the minimum value of each group;
obtaining a fourth product through the similarity in the group and the minimum value of the circulation object group;
accumulating the fourth products of the plurality of circulation object groups to obtain a fourth accumulated value, and accumulating the maximum value of the plurality of circulation object groups to obtain a fifth accumulated value;
and determining the ratio of the fourth accumulated value to the fifth accumulated value as the set similarity of the main body to be detected.
7. The method according to any one of claims 1 to 6, wherein the first and second flow-through objects satisfy any one or more of the following conditions:
the position of the inflow attribute value in the inflow attribute values of all the first circulation objects is higher than a first preset position, or the position of the outflow attribute value in the outflow attribute values of all the second circulation objects is higher than a second preset position;
the inflow attribute value and the outflow attribute value are both greater than a preset attribute value;
the proportion of the additional attribute value to the inflow attribute value of the first flow-through object set exceeds a first preset proportion, or the proportion of the additional attribute value to the outflow attribute value of the second flow-through object set exceeds a second preset proportion.
8. The method according to any one of claims 1 to 6, wherein after determining whether the subject to be tested is abnormal according to the set similarity, the method further comprises: and if the to-be-detected main body is determined to be abnormal, generating alarm information.
9. The method according to any one of claims 1 to 6, further comprising:
acquiring a record of receiving the circulation object;
if the circulation objects in the record meet preset conditions, adding the received circulation objects into circulation objects used for determining the set similarity of the main body to be tested, and re-determining the set similarity of the main body to be tested;
and if the main body to be tested in the record does not meet the preset condition, ignoring the circulation object in the record.
10. The method according to any one of claims 1 to 6, further comprising:
acquiring the received circulating object according to a preset period;
and adding the received circulating object into the circulating object for determining the set similarity of the main body to be detected, and re-determining the set similarity of the main body to be detected.
11. An abnormality detection device for a subject to be tested, comprising:
a first obtaining module, configured to obtain set similarity between a first circulation object set of a subject to be tested and a second circulation object set of the subject to be tested, where a first circulation object in the first circulation object set has an inflow attribute, and a second circulation object in the second circulation object set has an outflow attribute;
the determining module is used for determining whether the main body to be detected is abnormal or not according to the set similarity;
wherein, the main part that awaits measuring is the enterprise that awaits measuring, the confirm module still is used for: comparing the set similarity with a preset similarity threshold; determining that the main body to be tested is abnormal under the condition that the set similarity is smaller than the preset similarity threshold;
the first obtaining module is further configured to obtain a plurality of circulation object groups of the main body to be tested, where each circulation object group includes: a first flow object and a second flow object; acquiring the intra-group similarity of the first circulation object and the second circulation object in each circulation object group; and determining the set similarity of the body to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object.
12. A storage medium, characterized in that the storage medium includes a stored program, and wherein, when the program runs, the apparatus where the storage medium is located is controlled to execute the abnormality detection method for a subject to be detected according to any one of claims 1 to 10.
13. A processor, characterized in that the processor is configured to run a program, wherein the program executes the method for detecting an abnormality of a subject to be detected according to any one of claims 1 to 10.
14. An enterprise anomaly detection method, comprising:
acquiring the set similarity of an entry set of an enterprise and an expense item set of the enterprise, wherein commodities in the entry set are entries of the enterprise, and commodities in the expense item set are expense items of the enterprise;
determining whether the enterprise is abnormal according to the set similarity;
wherein, the main part that awaits measuring is the enterprise that awaits measuring, according to set similarity confirms whether the main part that awaits measuring is unusual, include: comparing the set similarity with a preset similarity threshold; determining that the main body to be tested is abnormal under the condition that the set similarity is smaller than the preset similarity threshold;
acquiring the set similarity between a first circulation object set of a main body to be tested and a second circulation object set of the main body to be tested, wherein the acquiring comprises the following steps: acquiring a plurality of circulation object groups of the main body to be detected, wherein each circulation object group comprises: a first flow object and a second flow object; acquiring the intra-group similarity of the first circulation object and the second circulation object in each circulation object group; and determining the set similarity of the body to be tested according to the intra-group similarity of the plurality of circulation object groups, the inflow attribute value of the first circulation object and the outflow attribute value of the second circulation object.
CN201710726608.6A 2017-08-22 2017-08-22 Abnormity detection method and device for main body to be detected and enterprise abnormity detection method Active CN109426968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710726608.6A CN109426968B (en) 2017-08-22 2017-08-22 Abnormity detection method and device for main body to be detected and enterprise abnormity detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710726608.6A CN109426968B (en) 2017-08-22 2017-08-22 Abnormity detection method and device for main body to be detected and enterprise abnormity detection method

Publications (2)

Publication Number Publication Date
CN109426968A CN109426968A (en) 2019-03-05
CN109426968B true CN109426968B (en) 2022-08-30

Family

ID=65498585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710726608.6A Active CN109426968B (en) 2017-08-22 2017-08-22 Abnormity detection method and device for main body to be detected and enterprise abnormity detection method

Country Status (1)

Country Link
CN (1) CN109426968B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN104636970A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise tax evasion through commodity differences and system thereof
CN104636971A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of detecting one number for multiple names of value added tax invoice and system thereof
CN104636972A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false deduction invoice through commodity composition and system thereof
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
CN106934705A (en) * 2015-12-28 2017-07-07 航天信息股份有限公司 A kind of special ticket doubtful point taxpayer's monitoring method of value-added tax based on SVMs
CN106933814A (en) * 2015-12-28 2017-07-07 航天信息股份有限公司 Tax data exception analysis method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN104636970A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise tax evasion through commodity differences and system thereof
CN104636971A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of detecting one number for multiple names of value added tax invoice and system thereof
CN104636972A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false deduction invoice through commodity composition and system thereof
CN104636973A (en) * 2013-11-06 2015-05-20 航天信息股份有限公司 Method of monitoring enterprise false invoice through commodity composition and system thereof
CN106934705A (en) * 2015-12-28 2017-07-07 航天信息股份有限公司 A kind of special ticket doubtful point taxpayer's monitoring method of value-added tax based on SVMs
CN106933814A (en) * 2015-12-28 2017-07-07 航天信息股份有限公司 Tax data exception analysis method and system

Also Published As

Publication number Publication date
CN109426968A (en) 2019-03-05

Similar Documents

Publication Publication Date Title
CN107918905B (en) Abnormal transaction identification method and device and server
CN107168854B (en) Internet advertisement abnormal click detection method, device, equipment and readable storage medium
CN106469276B (en) Type identification method and device of data sample
CN110647631A (en) Case recommendation method and device, storage medium and processor
CN112750011A (en) Commodity recommendation method and device and electronic equipment
CN110706026A (en) Abnormal user identification method, identification device and readable storage medium
CN110322093B (en) Information processing method, information display method, information processing device and computing equipment
CN105469279A (en) Commodity quality evaluation method and apparatus thereof
CN108734587A (en) The recommendation method and terminal device of financial product
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN106251178A (en) Data digging method and device
US20240062267A1 (en) Systems and methods for determining price bands and user price affinity predictions using machine learning architectures and techniques
CN111178537A (en) Feature extraction model training method and device
CN111756837A (en) Information pushing method, device, equipment and computer readable storage medium
CN110489531B (en) Method and device for determining high-frequency problem
CN114638704A (en) Illegal fund transfer identification method and device, electronic equipment and storage medium
CN111680213A (en) Information recommendation method, data processing method and device
CN111177564B (en) Product recommendation method and device
CN109426968B (en) Abnormity detection method and device for main body to be detected and enterprise abnormity detection method
CN113569162A (en) Data processing method, device, equipment and storage medium
CN108460049A (en) A kind of method and system of determining information category
CN116127184A (en) Product recommendation method and device, nonvolatile storage medium and electronic equipment
CN111488269B (en) Index detection method, device and system for data warehouse
CN115269924A (en) Link completion method and device, computer readable storage medium and electronic equipment
CN107291722B (en) Descriptor classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant