CN110866163A

CN110866163A - Information data auditing method, device and medium

Info

Publication number: CN110866163A
Application number: CN201911090231.5A
Authority: CN
Inventors: 刘洋; 王新然; 杨文鲜; 傅景楠; 李云飞
Original assignee: Yunmu Future Technology Beijing Co Ltd
Current assignee: Yunmu Future Technology Beijing Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2020-03-06

Abstract

The application discloses a method, a device and a storage medium for auditing information data, wherein the method comprises the following steps: acquiring information data to be audited, judging whether the information data to be audited is deviation information data needing manual deviation rectification according to a constructed data labeling estimation model, wherein the data labeling estimation model is obtained through supervised machine learning model training, and determining the deviation rectification priority of the deviation information data according to a preset rule. Through the embodiment, the working efficiency and the accuracy of the information data during auditing are improved.

Description

Information data auditing method, device and medium

Technical Field

The present application relates to the field of communications, and in particular, to a method, an apparatus, and a medium for auditing information data.

Background

With the rapid development of communication internet technology, more and more social platforms and information push platforms such as micro-blogs, today's first lines, micro-letters, learners and the like are popular, and the generation of the platforms promotes the interaction of people on various kinds of information. When screening and auditing special information of a certain type or information of a type specified by a user, a great deal of workload is brought to the screening and auditing due to the huge data volume. For example, a network supervision department may analyze and monitor public opinion information in a network in order to supervise the environment of a network platform, purify network language, and monitor the environment of the network platform. The method for monitoring the network public opinion information adopts a pure manual auditing method to analyze and audit the information so as to judge whether the information belongs to the public opinion information, and the method has huge workload and extremely low working efficiency; the existing computer auditing method only depends on an artificial intelligence algorithm to automatically analyze and screen the public sentiments in a coarse granularity level on mass information and output the analysis result whether the information belongs to the public sentiment information in real time.

At present, an effective method for improving the accuracy and the working efficiency of information data auditing is unavailable.

The embodiment of the disclosure provides an information data auditing method, device and medium, so as to improve the work efficiency and accuracy of information data during auditing.

Disclosure of Invention

The embodiment of the disclosure provides an information data auditing method, an information data auditing device and a storage medium, which can improve the work efficiency and accuracy of information data during auditing.

To solve the above technical problem, the embodiment of the present invention is implemented as follows:

in a first aspect, an embodiment of the present disclosure provides an information data auditing method, including:

acquiring information data to be audited;

judging whether the information data to be audited is deviation information data needing manual deviation correction or not according to a constructed data labeling estimation model, wherein the data labeling estimation model is obtained through training of a supervised machine learning model; and

and determining the deviation rectifying priority of the deviation information data according to a preset rule.

In a second aspect, the disclosed embodiments further provide a storage medium, where the storage medium includes a stored program, and the processor executes the auditing method for information data according to the first aspect when the program runs.

In a third aspect, an apparatus for auditing information data is further provided according to an embodiment of the present disclosure, including:

the information data acquisition module is used for acquiring information data to be audited;

the deviation data judgment module is used for judging whether the information data to be audited is deviation information data needing manual deviation correction or not according to the constructed data label estimation model, wherein the data label estimation model is obtained through supervised machine learning model training; and

and the data sequence confirmation module is used for determining the deviation rectifying priority of the deviation information data according to a preset rule.

In a fourth aspect, an embodiment of the present disclosure further provides an apparatus for auditing information data, including:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:

acquiring information data to be audited;

In the embodiment of the invention, information data to be audited is obtained, whether the information data to be audited is deviation information data needing manual deviation rectification is judged according to a constructed data labeling estimation model, wherein the data labeling estimation model is obtained through training of a supervised machine learning model, and the deviation rectification priority of the deviation information data is determined according to a preset rule. According to the method, deviation information data needing manual deviation correction is determined according to a constructed data label estimation model, deviation correction priority of the deviation information data is determined according to preset rules, error correction is carried out on the deviation information data according to the priority sequence, the working efficiency of information data auditing is improved, and the accuracy of the data auditing is improved by confirming the deviation information data through a data label estimation model for supervising machine model training.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computing device for implementing an auditing method for information data according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an auditing method for information data according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of an auditing apparatus for information data according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of an auditing apparatus for information data according to another embodiment of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to the present embodiment, there is also provided an embodiment of an auditing method for information data, it should be noted that the steps shown in the flowcharts of the figures may be executed in a computer system such as a set of computer-executable instructions, and that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be executed in an order different from that here.

The method embodiments provided by the present embodiment may be executed in a mobile terminal, a computer terminal, a server or a similar computing device. Fig. 1 shows a hardware block diagram of a computing device for implementing an auditing method for information data. As shown in fig. 1, the computing device may include one or more processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computing device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuitry may be a single, stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computing device. As referred to in the disclosed embodiments, the data processing circuit acts as a processor control (e.g., selection of a variable resistance termination path connected to the interface).

The memory may be configured to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the method for auditing information data in the embodiments of the present disclosure, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, implements the method for auditing information data of an application program. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory located remotely from the processor, which may be connected to the computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by communication providers of the computing devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computing device.

It should be noted here that in some alternative embodiments, the computing device shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that FIG. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in a computing device as described above.

In the foregoing operating environment, the present embodiment provides an auditing method for information data. Fig. 2 is a schematic flow chart of an auditing method for information data according to an embodiment of the present disclosure, and referring to fig. 2, the method includes:

s202: acquiring information data to be audited;

s204: judging whether the information data to be checked is deviation information data needing manual deviation correction or not according to the constructed data label estimation model, wherein the data label estimation model is obtained through training of a supervised machine learning model;

s206: and determining the deviation rectifying priority of the deviation information data according to a preset rule.

In step S202, to-be-audited information data is obtained. The information data to be audited is data of a specific type that the user needs to audit, and includes public sentiment data, illegal information, positive information and other data of various categories, and is not limited specifically here.

In the step S204, it is determined whether the information data to be audited is deviation information data requiring manual deviation correction according to the constructed data label estimation model, wherein the data label estimation model is obtained through supervised machine learning model training. Substituting the information data to be audited into the constructed data label estimation model, judging whether the information data to be audited is the deviation information data needing manual deviation correction according to the result, for example, substituting A, B, C data into the constructed data label estimation model, judging that the data A is the deviation information data needing manual deviation correction according to the model calculation result, and the data B and the data C are the deviation information data needing no manual deviation correction.

The data label estimation model in the step is obtained through supervised machine learning model training, various types of sample data needing to be audited are firstly obtained, the sample data are substituted into the supervised machine learning model, preset parameters are set, and the data label estimation model is obtained through sample training.

In the step S206, the correction priority of the deviation information data is determined according to the preset rule. The method comprises the steps of setting a preset rule according to a priority sequence of types of information data needing to be audited, required by a user, determining the deviation rectifying priority of the deviation information data according to the preset rule, and performing priority auditing on the deviation information data with high priority, so that the working efficiency of auditing the information data can be improved.

Further, judging whether the information data to be audited is deviation information data needing manual deviation correction according to the constructed data labeling estimation model, and the method comprises the following steps:

(a1) substituting the information data to be audited into the data label estimation model to obtain the confidence coefficient of the label type to which the information data to be audited belongs;

(a2) and judging whether the information data to be audited is deviation information data or not according to the first threshold and the confidence coefficient of the label type.

In the action (a1), the information data to be checked is substituted into the data labeling estimation model to obtain the confidence of the tag type to which the information data to be checked belongs. The data standard estimation model is used for estimating the type of the information data, namely substituting the data to be checked into the data labeling estimation model, performing data processing on the data to be checked to obtain the type of the information to be checked, labeling the label, and obtaining the estimation confidence of the type. For example, substituting a group of 5 pieces of information data to be audited into the data labeling estimation model, and obtaining the result of the group of information data to be audited respectively is as follows: (class a data, confidence 50%), (class B data, confidence 80%), (class C data, confidence 70%), (class a data, confidence 80%), (class D data, confidence 97%).

In the above-described operation (a2), it is determined whether the information data to be checked is the offset information data, based on the first threshold value and the confidence level of the tag type. The value of the first threshold may be 95%, may be 80%, or may be other data, and is not particularly limited herein, and whether the information data to be checked is the deviation information data is determined according to the confidence of the tag type corresponding to the first threshold and the data to be checked.

Further, judging whether the information data to be audited is deviation information data according to the first threshold and the confidence coefficient of the tag type, including:

(b1) if the confidence of the information data to be audited is smaller than a first threshold value, determining the information data to be audited as deviation information data;

(b2) and if the confidence coefficient of the information data to be audited is greater than or equal to the first threshold value, determining the information data to be audited as non-deviation information data which does not need to be rectified.

In the above-mentioned act (b1), if the confidence of the information data to be audited is smaller than the first threshold, the information data to be audited is determined as the deviation information data. For example, if the first threshold is 90% and the confidence level of the information data to be audited is 80%, the information data to be audited is determined to be the deviation information data.

In the above action (b2), if the confidence of the information data to be audited is greater than or equal to the first threshold, the information data to be audited is determined to be non-deviation information data that does not need to be rectified. For example, if the first threshold is 90% and the confidence of the information data to be audited is 95%, the information data to be audited is determined to be non-deviation information data which does not need to be rectified.

Further, determining the deviation rectifying priority of the deviation information data according to a preset rule, comprising:

(c1) acquiring a tag type and tag information of the deviation information data, wherein the tag type is used for indicating the type of the deviation information data, and the tag information is used for indicating the attention degree of the deviation information data;

(c2) and determining the deviation rectifying priority of the deviation information data according to the data type information and the label information.

In the above-mentioned action (c1), a tag type of the deviation information data and tag information are obtained, wherein the tag type is used for indicating the type of the deviation information data, and the tag information is used for indicating the attention degree of the deviation information data. The tag type of the deviation information data is acquired, and the tag type is used for indicating the type of the deviation information data, for example, the type of the deviation information data about the financial industry comprises 5 types of annual income rate, market profit rate, private fund, bond and stock. The label information is used to indicate the degree of interest of the deviation information data, for example, the deviation information data about the financial industry in the above example includes an annual profit rate type, a market profit rate type, a private fund type, a bond type, and a stock type, and the order of the degree of interest of the 5 kinds of deviation data is: label information of an annual earning rate type and a market earning rate type is a first-level priority, label information of a private fund type and a bond type is a second-level priority, and label information of a stock type is a third-level priority.

In the above operation (c2), the correction priority of the deviation information data is determined based on the data type information and the tag information. For example, in the example of (c1), the label information of the annuity yield type and the market yield type is a first-level priority, the label information of the private fund type and the bond type is a second-level priority, and the label information of the stock type is a third-level priority, where the priority order of the first-level priority, the second-level priority, and the third-level priority is sequentially reduced, and then the order of the deskew priorities of the 5 types of deviation information data is: annuity yield type and market profit type > privacy fund type and bond type > bond type.

Further, the tag information is divided into first tag information, second tag information and third tag information according to the attention degree from high to low, and the deviation rectifying priority of the deviation information data is determined according to the data type information and the tag information, and the method comprises the following steps:

(e1) and determining the deviation rectifying priority of the deviation information data according to the attention degree sequence of the label information corresponding to the data type information.

In the above-mentioned act (e1), the tag information is divided into first tag information, second tag information, and third tag information in the order of the degree of attention from high to low, and the order of priority of the deviation information data is determined according to the order of the degree of attention of the tag information corresponding to the data type information, and then the order of priority of the first tag information, the second tag information, and the third tag information is determined to be first tag information > second tag information > third tag information.

Further, this embodiment further includes:

(f1) sending the deviation information to deviation rectifying personnel for deviation rectifying marking to obtain audited information data containing the deviation rectifying information;

(f2) and taking the audited information data as a part of training samples in the training sample set, and performing iterative training on the data label estimation model to obtain the optimized data label estimation model.

In the action (f1), the deviation information is sent to the deviation rectifying staff for deviation rectifying and labeling, and verified information data containing the deviation rectifying information is obtained, and the verified information data further comprises information labeled before deviation rectifying. For example, if a deviation information marked as illegal information is corrected into legal information by a correction staff, the checked information data includes the illegal information before correction and the legal information after correction.

In the above action (f2), the audited information data is used as a part of the training sample in the training sample set, and the data label estimation model is iteratively trained to obtain the optimized data label estimation model. And performing iterative optimization on the data label estimation model by using the audited information data as sample data of the training sample set to obtain the optimized data label estimation model for the next information data audit, so that the accuracy of the information data audit can be improved. In a preferred embodiment, the pre-set high-attention type approved information data is used as a training sample set of the optimized data labeling estimation model, so that the accuracy of data approval can be improved to the maximum extent.

And further, according to the deviation rectifying priority of the deviation information data, the deviation information data are sent to corresponding deviation rectifying workers. Deviation information data with high priority are corrected manually, and the work efficiency of data checking can be improved.

Further, referring to fig. 1, according to a second aspect of the present embodiment, there is provided a storage medium. The storage medium includes a stored program, wherein the auditing method for information data described in any one of the above is executed by a processor when the program is executed.

The storage medium provided by the embodiment of the present application can implement the processes in the foregoing method embodiments, and achieve the same functions and effects, which are not repeated here.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

Fig. 3 is a schematic diagram of an apparatus for auditing information data according to an embodiment of the present disclosure, where the apparatus 300 corresponds to an auditing method for information data according to embodiment 1. Referring to fig. 3, the apparatus 300 includes:

an information data obtaining module 301, configured to obtain information data to be audited;

the deviation data judgment module 302 is configured to judge whether the information data to be audited is deviation information data requiring manual deviation correction according to a constructed data labeling estimation model, where the data labeling estimation model is obtained through supervised machine learning model training; and

and the data sequence confirming module 303 is configured to determine a deviation rectifying priority of the deviation information data according to a preset rule.

Optionally, the deviation data determining module 302 is specifically configured to:

substituting the information data to be audited into the data label estimation model to obtain the confidence coefficient of the label type to which the information data to be audited belongs;

and judging whether the information data to be audited is the deviation information data or not according to a first threshold value and the confidence coefficient of the label type.

Optionally, the deviation data determining module 302 is further specifically configured to:

if the confidence of the information data to be audited is smaller than the first threshold, determining the information data to be audited as the deviation information data;

and if the confidence of the information data to be audited is greater than or equal to the first threshold, determining the information data to be audited as non-deviation information data which does not need to be rectified.

Optionally, the data sequence confirmation module 303 is specifically configured to:

acquiring the label type of the deviation information data and label information used for indicating the attention degree of the deviation information data;

and determining the deviation rectifying priority of the deviation information data according to the data type information and the label information.

Optionally, the data sequence confirmation module 303 is further specifically configured to:

the label information is divided into first label information, second label information and third label information according to the sequence of the attention degree from high to low, and the deviation rectifying priority of the deviation information data is determined according to the sequence of the attention degree of the label information corresponding to the data type information.

Optionally, the apparatus further comprises a predictive model optimization module configured to:

sending the deviation information to deviation rectifying personnel for deviation rectifying marking to obtain audited information data containing the deviation rectifying information;

and taking the audited information data as a part of training samples in a training sample set, and performing iterative training on the data label estimation model to obtain an optimized data label estimation model.

Optionally, the apparatus further includes a deviation data sending module, configured to:

and sending the deviation information data to corresponding deviation rectifying workers according to the deviation rectifying priority of the deviation information data.

The auditing method and device for the information data provided by the embodiment of the application can realize each process in the method embodiment and achieve the same function and effect, and are not repeated here.

Example 3

Fig. 4 is a schematic diagram of an apparatus for auditing information data according to another embodiment of the present disclosure, where the apparatus 400 corresponds to the method according to the first aspect of embodiment 1. Referring to fig. 4, the apparatus 400 includes: a processor 410; and a memory 420 coupled to the processor 410 for providing instructions to the processor 410 to process the following process steps:

acquiring information data to be audited;

Optionally, judging whether the information data to be audited is deviation information data which needs to be corrected manually according to the constructed data label estimation model, including:

Optionally, judging whether the information data to be audited is the deviation information data according to a first threshold and the confidence of the tag type, including:

Optionally, determining the rectification priority of the deviation information data according to a preset rule includes:

Optionally, the tag information is divided into first tag information, second tag information and third tag information in an order from a high attention degree to a low attention degree, and the determining of the de-skew priority of the deviation information data according to the data type information and the tag information includes:

and determining the deviation rectifying priority of the deviation information data according to the attention degree sequence of the label information corresponding to the data type information.

Optionally, the apparatus further comprises:

The auditing device for the information data provided by the embodiment of the application can realize each process in the method embodiment and achieve the same function and effect, and the process is not repeated here.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An information data auditing method is characterized by comprising the following steps:

acquiring information data to be audited;

2. The method according to claim 1, wherein judging whether the information data to be audited is deviation information data requiring manual deviation correction according to the constructed data label estimation model comprises:

3. The method of claim 2, wherein determining whether the information data to be reviewed is the deviation information data according to a first threshold and the confidence of the tag type comprises:

4. The method of claim 1, wherein determining the de-skew priority of the deviation information data according to a preset rule comprises:

5. The method of claim 4, wherein the tag information is divided into first tag information, second tag information and third tag information in order of the degree of interest from high to low, and determining the de-skew priority of the deviation information data according to the data type information and the tag information comprises:

6. The method of claim 1, further comprising:

and taking the audited information data as an amplification training sample in a training sample set, and performing iterative training on the data labeling estimation model to obtain an optimized data labeling estimation model.

7. The method of claim 1, further comprising:

8. A storage medium, characterized in that the storage medium comprises a stored program, wherein an auditing method for information data according to any one of claims 1 to 7 is executed by a processor when the program is run.

9. An information data auditing apparatus, comprising:

10. An information data auditing apparatus, comprising:

a processor; and

acquiring information data to be audited;