CN113095064A - Code field identification method and device, electronic equipment and storage medium - Google Patents

Code field identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113095064A
CN113095064A CN202110292981.1A CN202110292981A CN113095064A CN 113095064 A CN113095064 A CN 113095064A CN 202110292981 A CN202110292981 A CN 202110292981A CN 113095064 A CN113095064 A CN 113095064A
Authority
CN
China
Prior art keywords
field
dictionary
target
value
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110292981.1A
Other languages
Chinese (zh)
Inventor
李云锋
李鹏飞
王倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202110292981.1A priority Critical patent/CN113095064A/en
Publication of CN113095064A publication Critical patent/CN113095064A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a code field identification method and device, electronic equipment and a storage medium. According to the method and the device, the value of the target field of the data table is counted to obtain the statistical data, if the statistical data meet the preset condition, the characteristic value of the preset field characteristic corresponding to the target field is obtained according to the statistical data, the characteristic value is input into the trained code field identification model, whether the target field is the code field or not is determined according to the output result of the code field identification model, the manual input is reduced, and the code field identification efficiency is improved.

Description

Code field identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for identifying a code field, an electronic device, and a storage medium.
Background
In the process of building a large data platform of government affairs and enterprises, a large number of business system tables need to be extracted from a plurality of business systems to a data warehouse, and then the business data in the tables are cleaned and processed to become reusable data resources. In the process of data management, identifying the code field in the service system table is an important task, and only if the code field is identified, the data quality of the field can be judged, and then key and standard data resources are formed through processing.
In the related art, the efficiency is low by identifying the code field manually.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a code field identification method, a code field identification device, electronic equipment and a storage medium, and the efficiency of code field identification is improved.
According to a first aspect of the embodiments of the present invention, there is provided a code field identification method, the method including:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
According to a second aspect of the embodiments of the present invention, there is provided a code field identifying apparatus, the apparatus including:
the statistical module is used for carrying out statistics on the value of the target field of the data table to obtain statistical data;
a characteristic value obtaining module, configured to obtain, according to the statistical data, a characteristic value of a preset field characteristic corresponding to the target field if the statistical data meets a preset condition;
and the determining module is used for inputting the characteristic value into the trained code field recognition model and determining whether the target field is a code field according to the output result of the code field recognition model.
According to a third aspect of embodiments of the present invention, there is provided an electronic device comprising a processor and a memory for storing executable instructions of the processor;
the processor is configured to:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
According to a fourth aspect of the embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a number of computer instructions which, when executed, perform the following:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the value of the target field of the data table is counted to obtain the statistical data, if the statistical data meets the preset condition, the characteristic value of the preset field characteristic corresponding to the target field is obtained according to the statistical data, the characteristic value is input into the trained code field identification model, and whether the target field is the code field is determined according to the output result of the code field identification model, so that the manual input is reduced, and the efficiency of code field identification is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
Fig. 1 is a flowchart illustrating a method for identifying a code field according to an embodiment of the present invention.
Fig. 2 is a functional block diagram of a code field identification apparatus according to an embodiment of the present invention.
Fig. 3 is a hardware structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of embodiments of the invention, as detailed in the following claims.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used to describe various information in embodiments of the present invention, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In a service system, if a service field stores a value that is a finite range of values or codes, such a field is referred to as a code field. For example, the value of the nationality of China cannot exceed 56, otherwise, the data can be regarded as abnormal data.
In the data management process, it is very important to identify code segments to carry out standardized cleaning on data. In the related art, a code field is identified manually.
One of the ways is: the data architect confirms the code field by communicating with the business system's builder.
The other mode is as follows: the data architect himself analyzes the records in the database and determines the code fields from his own experience.
The two ways described above show that the drawback of manually identifying the code field is obvious, that is, a great deal of effort and time is required, and the workload is very large. This not only makes the labor cost required for code field identification extremely high, but also requires a long time and is inefficient.
The following describes the code field identification method provided by the present invention in detail by way of embodiments.
Fig. 1 is a flowchart illustrating a method for identifying a code field according to an embodiment of the present invention. As shown in fig. 1, the code field identification method may include:
s101, counting the values of the target fields of the data table to obtain statistical data.
And S102, if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data.
S103, inputting the characteristic value into the trained code field recognition model, and determining whether the target field is a code field according to the output result of the code field recognition model.
In this embodiment, the target field is a field that needs to be identified by a code field. For example, when a code field needs to be identified from the data table 1, each field in the data table 1 may be identified according to the code field identification method provided in this embodiment, and at this time, each field in the data table is a target field.
In application, the fields in the plurality of data tables can be respectively identified in batches by using the code field identification method provided by the embodiment.
In one example, the statistical data may include statistical record count, vacancy rate, code distribution.
The statistical record number refers to the number of records sampled in statistics.
Usually, the number of records in the data table is very large, and for improving efficiency, the records of the fields may be sampled and counted. For example, assuming that the data table 1 has 100 ten thousand records, each field in the data table 1 has 100 ten thousand records, and 1 ten thousand records are sampled from the 100 ten thousand records of the field for statistics.
The null rate is the ratio of the number of records with null field value to the number of statistical records.
Wherein code distribution refers to the number of individual codes.
The statistics for a certain field may be as shown in table 1.
TABLE 1
Figure BDA0002983079150000051
The record number (record _ num) in table 1 is a statistical record number. In table 1, the code distribution indicates that there are 100 dictionary values (i.e., keys in Key (Key) Value (Value) pairs) in total among 2092 records in the statistical field "xzbg"), where 61 records with a dictionary Value of 15, 260 records with a dictionary Value of 20, 500 records with a dictionary Value of 10, and 471 records with a dictionary Value of 25.
In one example, the preset condition may be that an idle rate of the value of the target field in the statistical result is less than a preset idle rate threshold, and the code distribution data of the target field in the statistical result is not null.
If the vacancy rate of the statistical data of a certain field is greater than or equal to the vacancy rate threshold value, or the code distribution data of the field is null, at this time, because the available information of the statistical data of the field is little or effective information cannot be provided, the statistical data of the field can be deleted, and the code field is not identified for the field.
In one example, the preset field characteristics may include a statistical record number of the field, a code type number, a maximum value and a minimum value of the occurrence number of each code, and an actual record number.
Whether the code field is positively correlated with the concentration degree of code distribution or not and the ratio of the concentration degree of code distribution to the number of records to the number of code types have positive correlation, so that the number of code types of each field can be counted as the preset field characteristic.
The occurrence frequency of each code and the maximum value and the minimum value of the occurrence frequency of the codes in the field to which the code belongs can reflect the concentration degree of code distribution to a certain extent, so that the maximum value and the minimum value of the occurrence frequency of each code in the field are also used as preset field characteristics.
Due to the limitation of the number of words in the table, an incomplete statistical key value pair exists in the partial code distribution, so that the statistical record number is not the record number in the actual table, and the existing actual record number (real _ record _ num) can be counted according to the code distribution.
The actual number of records is equal to the product of the number of statistical records and a first ratio of the actual total number of records in the field to the number of records sampled in the statistics. For example, assuming that the actual total number of records of a field is 100 ten thousand, the number of records sampled in statistics is 1 ten thousand, and the number of statistical records is 2000, the actual number of records is equal to 2000 × (100 ÷ 1) ═ 20 ten thousand.
In an exemplary implementation process, obtaining a feature value of a preset field feature corresponding to the target field according to the statistical data may include:
cleaning the statistical data to obtain target data;
and determining a characteristic value of a preset field characteristic corresponding to the target field according to the target data.
Noise data caused by manual operation can exist in code distribution, such as inconsistent case, Chinese wrongly written characters, names and codes appearing at the same time, and the like, so that key values in the code distribution are used for cleaning the data, long tail data and irregular data are removed, and normalized records are formed.
Through cleaning, more standard target data can be obtained, so that the characteristic value of the preset field characteristic is determined based on the target data, and the accuracy of code field identification can be improved.
In one example, the code field identification model acquisition process may include:
setting a machine learning model;
acquiring sample data, wherein the sample data comprises a characteristic value of a preset field characteristic corresponding to a sample field and a label value corresponding to the sample field, and the label value is used for indicating whether the sample field is a code field;
and training the machine learning model by using the sample data to obtain a trained machine learning model, and taking the trained machine learning model as a code field identification model.
The characteristic value of each preset field characteristic corresponding to the sample field can be obtained from the statistical data of the sample field. The label value corresponding to the sample field can be manually labeled. For example, if the tag value is 1, it means that the sample field is a code field, and the tag value is 0, it means that the sample field 1 is not a code field, and if the sample field 2 is not a code field, the tag value corresponding to the sample field 1 is 1, and the tag value corresponding to the sample field 2 is 0.
In one example, the machine learning model may be a two-class model, such as an xgboost (extreme Gradient boosting) algorithm model. In other examples, the machine learning model may also use other classifier models, which is not limited in this embodiment.
In one example, the method may further comprise:
acquiring a target word vector corresponding to the target field;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model, wherein the method comprises the following steps:
and inputting the characteristic value and the target word vector into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
In one example, obtaining the target word vector corresponding to the target field may include:
and determining a target word vector corresponding to the target field according to the field name of the target field.
Wherein the field name of the target field can be obtained from the statistical data of the target field (as shown in table 1 above).
In one example, obtaining the target word vector corresponding to the target field may include:
and determining a target word vector corresponding to the target field according to the field description information of the target field.
The field description information of the target field may be obtained from the statistical data of the target field, for example, the value of the "field description" field in table 1 is the field description information.
In this embodiment, a word vector database may be preset in the system, and a target word vector corresponding to the target field is obtained by querying the word vector database, for example, the query condition may be a field name or field description information of the target field.
According to the application requirement, different word vector databases can be set in different scenes. In the same scene, the word vector database can be updated according to the increase of accumulated experience data.
In the embodiment, the word vector is also used as the input of the code field identification model, so that the input characteristics of the code field identification model are enriched, and the performance of the code field identification model can be improved.
When the code field recognition model of this embodiment is trained, the sample data includes a word vector corresponding to the sample field in addition to the feature value of the preset field feature corresponding to the sample field and the tag value corresponding to the sample field.
In one example, the obtaining process of the code field recognition model including the word vector in the input data may include:
setting a machine learning model;
acquiring sample data, wherein the sample data comprises a characteristic value of a preset field characteristic corresponding to a sample field, a word vector corresponding to the sample field and a label value corresponding to the sample field, and the label value is used for indicating whether the sample field is a code field;
and training the machine learning model by using the sample data to obtain a trained machine learning model, and taking the trained machine learning model as a code field identification model.
The characteristic value of each preset field characteristic corresponding to the sample field can be obtained from the statistical data of the sample field. The word vector corresponding to the sample field can be obtained by querying a word vector database according to the field name or the field description information of the sample field. The label value corresponding to the sample field can be manually labeled.
The code field identification method provided by the embodiment of the invention is further described in detail by way of example.
Assuming that there are 100 fields, respectively, field 1 to field 100, the code field identification process for field 1 to field 100 may be as follows:
(1) and respectively counting the values of the field 1 to the field 100 to obtain corresponding statistical data 1 to statistical data 100.
(2) Assuming that the vacancy rate threshold is 90%, wherein the vacancy rates of the fields 1 to 5 are greater than 90%, deleting the statistical data 1 to 5 corresponding to the fields 1 to 5, and leaving the statistical data 6 to 100.
(3) And respectively cleaning the statistical data 6-100, and deleting long tail data and non-standard data in the statistical data 6-100 to obtain target data 6-100.
(4) According to the target data 6 to the target data 100, the number of statistical records, the number of code types, the maximum value and the minimum value in the occurrence frequency of each code, the actual number of records and the word vector corresponding to the fields 6 to 100 are obtained respectively.
(5) And respectively determining whether the fields 6 to 100 are code fields according to the statistical record number, the code type number, the maximum value and the minimum value in the occurrence times of each code, the actual record number and the word vector corresponding to the fields 6 to 100. Taking the field 6 as an example, inputting the statistical record number, the code type number, the maximum value and the minimum value in the occurrence times of each code, the actual record number and the word vector corresponding to the field 6 into the trained code field recognition model, if the output result of the code field recognition model is 1, determining that the field 6 is a code field, and if the output result of the code field recognition model is 0, determining that the field 6 is not a code field.
Therefore, the embodiment of the invention can automatically recognize the code fields in batches, and greatly improves the efficiency of recognizing the code fields.
In one example, after step S103, the method may further include:
and if the target field is determined to be the code field, generating a dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
When there is no existing dictionary table, it can be based on
The dictionary table records key (dictionary name) value (dictionary value) pairs in the code field, and can be used for cleaning the data table to be cleaned as a standard.
Each record in the dictionary table may include information such as a dictionary table name, a dictionary table ID, a standard dictionary name, a standard dictionary value, and the like.
In one example, generating a dictionary table from dictionary names and corresponding dictionary values contained in the target field may include:
determining a dictionary table name according to the field description information of the target field;
and adding a standard dictionary name and a corresponding standard dictionary value in a dictionary table according to the dictionary name and the corresponding dictionary value of the target field.
In one example, the method may further comprise:
and if the target field is determined to be a code field, modifying the existing dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
In the case where a dictionary table already exists, there is no need to generate the dictionary table, and dictionary names contained in the target field can be added to the existing dictionary table by adding dictionary table records.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field may include:
and if the existing dictionary table contains the dictionary table name corresponding to the target field and the target field contains the target dictionary name which is not contained in the existing dictionary table, adding the target dictionary name and the corresponding dictionary value into the existing dictionary table.
For example, assume that the code fields are as shown in Table 2 and the existing dictionary tables are as shown in Table 3.
TABLE 2
Figure BDA0002983079150000101
Figure BDA0002983079150000111
TABLE 3
Categories of Dictionary table name Dictionary table ID Standard dictionary name Standard dictionary values
Public Job scale aca10ef8-beb3-4068-8e71-bd0b4d973baf Assistant engineer 10
Public Job scale aca10ef8-beb3-4068-8e71-bd0b4d973baf Researchers 40
Public Job scale aca10ef8-beb3-4068-8e71-bd0b4d973baf Senior engineer 30
Public Job scale aca10ef8-beb3-4068-8e71-bd0b4d973baf Engineer(s) 20
As can be seen from tables 2 and 3, the field of "title" contains 13 dictionary names, wherein 4 dictionary names "assistant engineer", "researcher", "senior engineer" and "engineer" are also contained in the existing dictionary table shown in table 3, and 9 dictionary names (for example, "assistant professor", "assistant researcher" and the like) are not contained in the existing dictionary table shown in table 3, and at this time, the 9 dictionary names and their corresponding dictionary values are added to the existing dictionary table shown in table 3.
In some cases, the code field may contain dictionary names and dictionary values, and the dictionary values corresponding to the dictionary names may be extracted from the code field. When the code field only contains dictionary names and does not contain dictionary values, the dictionary values can be set for the dictionary names in an artificial mode, and the setting needs to be ensured: the dictionary values for different dictionary names are also different.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field may include:
and if the existing dictionary table contains the dictionary table name corresponding to the target field and the target field contains the target dictionary name which is not contained in the existing dictionary table, adding the target dictionary name and the corresponding dictionary value into the existing dictionary table if the distribution quantity of the target dictionary name in the target field meets a preset centralized distribution condition.
Wherein, satisfying the centralized distribution condition indicates that the corresponding dictionary names are distributed more.
In one example, the centralized distribution condition may be: the ratio of the distribution quantity of the target dictionary names in the target field to the sum of the distribution quantities of all dictionary names in the target field is greater than or equal to a preset percentage.
In another example, the centralized distribution condition may be: the distribution quantity of the target dictionary names in the target field is larger than or equal to a preset quantity threshold value.
For example, as can be seen from table 3, in the code distribution of the field "title", the dictionary names are mainly concentrated on "researcher", "sub researcher", "professor", "sub professor", "assistant engineer", "advanced engineer", and 4 of them, namely "assistant engineer", "researcher", "advanced engineer" and "engineer", are already included in the dictionary table shown in table 3, and according to the present embodiment, only the 3 dictionary names of "sub researcher", "professor" sub professor "and the corresponding dictionary values need to be added to the dictionary table shown in table 3.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field may include:
and if the existing dictionary table does not contain the dictionary table name corresponding to the target field, adding all dictionary names and corresponding dictionary values contained in the target field into the existing dictionary table.
When the existing dictionary table does not contain the dictionary table name corresponding to the target field, the dictionary name in the target field can be directly determined not to be contained in the existing dictionary table, and at the moment, each dictionary name in the target field does not need to be judged, so that the processing time can be saved.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field may include:
and if the first dictionary name contained in the target field is the same as a second dictionary name in the existing dictionary table, but a first dictionary value corresponding to the first dictionary name in the target field is different from a second dictionary value corresponding to the second dictionary name in the existing dictionary table, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
In the present embodiment, the first dictionary name and the second dictionary name are the same dictionary name, and here are different names taken to distinguish the positions (target fields or existing dictionary tables) where the dictionary names are located.
In some cases, dictionary values corresponding to certain dictionary names are updated according to the new data standard, i.e., the new dictionary values are different from the old dictionary values. When the time for establishing the dictionary table is earlier and the time for establishing the data table containing the target field is later than the time for establishing the dictionary table, according to the new data standard, the dictionary value adopted in the target field is the new dictionary value, and the corresponding dictionary value in the existing dictionary table is the old dictionary value, at this time, in order to enable the dictionary table to be in line with the new data standard, the old dictionary value is required to be changed into the new dictionary value in order to be continuously applied to the scene of data cleaning.
In one example, before modifying the second dictionary value in the existing dictionary table to the first dictionary value, the method may further include:
displaying prompt information for modifying the dictionary value and information for confirming whether to modify the dictionary value;
modifying the second dictionary value in the existing dictionary table to the first dictionary value, comprising:
and if the information for confirming the modification is received, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
The displayed information for confirming whether the modification is allowed may be clickable virtual buttons of "determine" (indicating that the modification is allowed), "cancel" (indicating that the modification is refused), or "yes" (indicating that the modification is allowed), "no" (indicating that the modification is refused). The user can indicate whether to approve the subjective intention of the modification by confirming whether the modified information is modified. For example, the user may approve the modification by clicking the "ok" virtual button, or reject the modification by clicking the "cancel" virtual button.
According to the code field identification method provided by the embodiment of the invention, the value of the target field of the data table is counted to obtain the statistical data, if the statistical data meets the preset condition, the characteristic value of the preset field characteristic corresponding to the target field is obtained according to the statistical data, the characteristic value is input into the trained code field identification model, and whether the target field is the code field is determined according to the output result of the code field identification model, so that the manual input is reduced, and the efficiency of code field identification is improved. Meanwhile, due to the reduction of manpower input, the service cost of code field identification is greatly reduced.
Based on the above method embodiment, the embodiment of the present invention further provides corresponding apparatus, device, and storage medium embodiments. For detailed implementation of the embodiments of the apparatus, device and storage medium of the embodiments of the present invention, please refer to the corresponding descriptions in the foregoing method embodiments.
Fig. 2 is a functional block diagram of a code field identification apparatus according to an embodiment of the present invention. As shown in fig. 2, in this embodiment, the code field identifying device may include:
the statistical module 210 is configured to perform statistics on values of target fields of the data table to obtain statistical data;
a feature value obtaining module 220, configured to, if the statistical data meets a preset condition, obtain, according to the statistical data, a feature value of a preset field feature corresponding to the target field;
and the determining module 230 is configured to input the feature value into the trained code field recognition model, and determine whether the target field is a code field according to an output result of the code field recognition model.
In one example, the feature value obtaining module 220 may be specifically configured to:
cleaning the statistical data to obtain target data;
and determining a characteristic value of a preset field characteristic corresponding to the target field according to the target data.
In one example, the code field identifies an acquisition process of a model, comprising:
setting a machine learning model;
acquiring sample data, wherein the sample data comprises a characteristic value of a preset field characteristic corresponding to a sample field and a label value corresponding to the sample field, and the label value is used for indicating whether the sample field is a code field;
and training the machine learning model by using the sample data to obtain a trained machine learning model, and taking the trained machine learning model as a code field identification model.
In one example, the method further comprises:
acquiring a target word vector corresponding to the target field;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model, wherein the method comprises the following steps:
and inputting the characteristic value and the target word vector into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
In one example, the preset field characteristics comprise the statistical record number, the code type number, the maximum value and the minimum value in the occurrence times of each code, and the actual record number of the field; the actual number of records is equal to the product of the number of statistical records and a first ratio, and the first ratio is the ratio of the actual total number of records in the target field to the number of records sampled in the statistics.
In one example, the statistical data includes statistical record count, vacancy rate, code distribution.
In one example, the preset condition is that an idle rate of the value of the target field in the statistical result is smaller than a preset idle rate threshold, and the code distribution data of the target field in the statistical result is not null.
In one example, further comprising:
and the dictionary table generating module is used for generating a dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field if the target field is determined to be the code field.
In one example, further comprising:
and the dictionary table modifying module is used for modifying the existing dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field if the target field is determined to be the code field.
In one example, the dictionary table modification module may be specifically configured to:
and if the existing dictionary table contains the dictionary table name corresponding to the target field and the target field contains the target dictionary name which is not contained in the existing dictionary table, adding the target dictionary name and the corresponding dictionary value into the existing dictionary table.
In one example, the dictionary table modification module may be specifically configured to:
and if the existing dictionary table does not contain the dictionary table name corresponding to the target field, adding all dictionary names and corresponding dictionary values contained in the target field into the existing dictionary table.
In one example, the dictionary table modification module may be specifically configured to:
and if the first dictionary name contained in the target field is the same as a second dictionary name in the existing dictionary table, but a first dictionary value corresponding to the first dictionary name in the target field is different from a second dictionary value corresponding to the second dictionary name in the existing dictionary table, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
In one example, further comprising:
the display module is used for displaying prompt information for modifying the dictionary value and information for confirming whether the second dictionary value is modified or not before the second dictionary value in the existing dictionary table is modified into the first dictionary value;
the dictionary table modifying module, when configured to modify the second dictionary value in the existing dictionary table into the first dictionary value, may be specifically configured to:
and if the information for confirming the modification is received, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
The embodiment of the invention also provides the electronic equipment. Fig. 3 is a hardware structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 3, the electronic apparatus includes: an internal bus 301, and a memory 302, a processor 303, and an external interface 304 connected through the internal bus.
When the electronic device is used as a working node, the processor 303 is configured to read the machine-readable instructions in the memory 302 and execute the instructions to implement the following operations:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
In one example, obtaining a feature value of a preset field feature corresponding to the target field according to the statistical data includes:
cleaning the statistical data to obtain target data;
and determining a characteristic value of a preset field characteristic corresponding to the target field according to the target data.
In one example, the code field identifies an acquisition process of a model, comprising:
setting a machine learning model;
acquiring sample data, wherein the sample data comprises a characteristic value of a preset field characteristic corresponding to a sample field and a label value corresponding to the sample field, and the label value is used for indicating whether the sample field is a code field;
and training the machine learning model by using the sample data to obtain a trained machine learning model, and taking the trained machine learning model as a code field identification model.
In one example, further comprising:
acquiring a target word vector corresponding to the target field;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model, wherein the method comprises the following steps:
and inputting the characteristic value and the target word vector into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
In one example, the preset field characteristics comprise the statistical record number, the code type number, the maximum value and the minimum value in the occurrence times of each code, and the actual record number of the field; the actual number of records is equal to the product of the number of statistical records and a first ratio, and the first ratio is the ratio of the actual total number of records in the target field to the number of records sampled in the statistics.
In one example, the statistical data includes statistical record count, vacancy rate, code distribution.
In one example, the preset condition is that an idle rate of the value of the target field in the statistical result is smaller than a preset idle rate threshold, and the code distribution data of the target field in the statistical result is not null.
In one example, further comprising:
and if the target field is determined to be the code field, generating a dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
In one example, further comprising:
and if the target field is determined to be a code field, modifying the existing dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field includes:
and if the existing dictionary table contains the dictionary table name corresponding to the target field and the target field contains the target dictionary name which is not contained in the existing dictionary table, adding the target dictionary name and the corresponding dictionary value into the existing dictionary table.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field includes:
and if the existing dictionary table does not contain the dictionary table name corresponding to the target field, adding all dictionary names and corresponding dictionary values contained in the target field into the existing dictionary table.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field includes:
and if the first dictionary name contained in the target field is the same as a second dictionary name in the existing dictionary table, but a first dictionary value corresponding to the first dictionary name in the target field is different from a second dictionary value corresponding to the second dictionary name in the existing dictionary table, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
In one example, before modifying the second dictionary value in the existing dictionary table to the first dictionary value, the method further includes:
displaying prompt information for modifying the dictionary value and information for confirming whether to modify the dictionary value;
modifying the second dictionary value in the existing dictionary table to the first dictionary value, comprising:
and if the information for confirming the modification is received, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
An embodiment of the present invention further provides a computer-readable storage medium, where a plurality of computer instructions are stored on the computer-readable storage medium, and when executed, the computer instructions perform the following processing:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
In one example, obtaining a feature value of a preset field feature corresponding to the target field according to the statistical data includes:
cleaning the statistical data to obtain target data;
and determining a characteristic value of a preset field characteristic corresponding to the target field according to the target data.
In one example, the code field identifies an acquisition process of a model, comprising:
setting a machine learning model;
acquiring sample data, wherein the sample data comprises a characteristic value of a preset field characteristic corresponding to a sample field and a label value corresponding to the sample field, and the label value is used for indicating whether the sample field is a code field;
and training the machine learning model by using the sample data to obtain a trained machine learning model, and taking the trained machine learning model as a code field identification model.
In one example, further comprising:
acquiring a target word vector corresponding to the target field;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model, wherein the method comprises the following steps:
and inputting the characteristic value and the target word vector into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
In one example, the preset field characteristics comprise the statistical record number, the code type number, the maximum value and the minimum value in the occurrence times of each code, and the actual record number of the field; the actual number of records is equal to the product of the number of statistical records and a first ratio, and the first ratio is the ratio of the actual total number of records in the target field to the number of records sampled in the statistics.
In one example, the statistical data includes statistical record count, vacancy rate, code distribution.
In one example, the preset condition is that an idle rate of the value of the target field in the statistical result is smaller than a preset idle rate threshold, and the code distribution data of the target field in the statistical result is not null.
In one example, further comprising:
and if the target field is determined to be the code field, generating a dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
In one example, further comprising:
and if the target field is determined to be a code field, modifying the existing dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field includes:
and if the existing dictionary table contains the dictionary table name corresponding to the target field and the target field contains the target dictionary name which is not contained in the existing dictionary table, adding the target dictionary name and the corresponding dictionary value into the existing dictionary table.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field includes:
and if the existing dictionary table does not contain the dictionary table name corresponding to the target field, adding all dictionary names and corresponding dictionary values contained in the target field into the existing dictionary table.
In one example, modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field includes:
and if the first dictionary name contained in the target field is the same as a second dictionary name in the existing dictionary table, but a first dictionary value corresponding to the first dictionary name in the target field is different from a second dictionary value corresponding to the second dictionary name in the existing dictionary table, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
In one example, before modifying the second dictionary value in the existing dictionary table to the first dictionary value, the method further includes:
displaying prompt information for modifying the dictionary value and information for confirming whether to modify the dictionary value;
modifying the second dictionary value in the existing dictionary table to the first dictionary value, comprising:
and if the information for confirming the modification is received, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
For the device and apparatus embodiments, as they correspond substantially to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (16)

1. A method for code field identification, the method comprising:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
2. The method according to claim 1, wherein obtaining the feature value of the preset field feature corresponding to the target field according to the statistical data comprises:
cleaning the statistical data to obtain target data;
and determining a characteristic value of a preset field characteristic corresponding to the target field according to the target data.
3. The method of claim 1, wherein the code field identifies an acquisition process of the model, comprising:
setting a machine learning model;
acquiring sample data, wherein the sample data comprises a characteristic value of a preset field characteristic corresponding to a sample field and a label value corresponding to the sample field, and the label value is used for indicating whether the sample field is a code field;
and training the machine learning model by using the sample data to obtain a trained machine learning model, and taking the trained machine learning model as a code field identification model.
4. The method of claim 1, further comprising:
acquiring a target word vector corresponding to the target field;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model, wherein the method comprises the following steps:
and inputting the characteristic value and the target word vector into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
5. The method according to claim 1, wherein the preset field characteristics comprise the number of statistical records, the number of code types, the maximum value and the minimum value of the occurrence times of each code, and the actual number of records of the field; the actual number of records is equal to the product of the number of statistical records and a first ratio, and the first ratio is the ratio of the actual total number of records in the target field to the number of records sampled in the statistics.
6. The method of claim 1, wherein the statistical data comprises statistical record count, vacancy rate, and code distribution.
7. The method according to claim 1, wherein the predetermined condition is that a vacancy rate of the value of the target field in the statistical result is smaller than a predetermined vacancy rate threshold, and the code distribution data of the target field in the statistical result is not null.
8. The method of claim 1, further comprising:
and if the target field is determined to be the code field, generating a dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
9. The method of claim 1, further comprising:
and if the target field is determined to be a code field, modifying the existing dictionary table according to the dictionary name and the corresponding dictionary value contained in the target field.
10. The method of claim 1, wherein modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field comprises:
and if the existing dictionary table contains the dictionary table name corresponding to the target field and the target field contains the target dictionary name which is not contained in the existing dictionary table, adding the target dictionary name and the corresponding dictionary value into the existing dictionary table.
11. The method of claim 1, wherein modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field comprises:
and if the existing dictionary table does not contain the dictionary table name corresponding to the target field, adding all dictionary names and corresponding dictionary values contained in the target field into the existing dictionary table.
12. The method of claim 1, wherein modifying an existing dictionary table based on dictionary names and corresponding dictionary values contained in the target field comprises:
and if the first dictionary name contained in the target field is the same as a second dictionary name in the existing dictionary table, but a first dictionary value corresponding to the first dictionary name in the target field is different from a second dictionary value corresponding to the second dictionary name in the existing dictionary table, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
13. The method of claim 12, wherein modifying the second dictionary value in the existing dictionary table to the first dictionary value further comprises:
displaying prompt information for modifying the dictionary value and information for confirming whether to modify the dictionary value;
modifying the second dictionary value in the existing dictionary table to the first dictionary value, comprising:
and if the information for confirming the modification is received, modifying the second dictionary value in the existing dictionary table into the first dictionary value.
14. An apparatus for code field identification, the apparatus comprising:
the statistical module is used for carrying out statistics on the value of the target field of the data table to obtain statistical data;
a characteristic value obtaining module, configured to obtain, according to the statistical data, a characteristic value of a preset field characteristic corresponding to the target field if the statistical data meets a preset condition;
and the determining module is used for inputting the characteristic value into the trained code field recognition model and determining whether the target field is a code field according to the output result of the code field recognition model.
15. An electronic device comprising a processor and a memory for storing executable instructions of the processor;
the processor is configured to:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
16. A computer readable storage medium having stored thereon computer instructions that, when executed, perform the following:
counting the values of the target fields of the data table to obtain statistical data;
if the statistical data meet a preset condition, obtaining a characteristic value of a preset field characteristic corresponding to the target field according to the statistical data;
inputting the characteristic value into a trained code field recognition model, and determining whether the target field is a code field according to an output result of the code field recognition model.
CN202110292981.1A 2021-03-18 2021-03-18 Code field identification method and device, electronic equipment and storage medium Pending CN113095064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292981.1A CN113095064A (en) 2021-03-18 2021-03-18 Code field identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292981.1A CN113095064A (en) 2021-03-18 2021-03-18 Code field identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113095064A true CN113095064A (en) 2021-07-09

Family

ID=76668661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110292981.1A Pending CN113095064A (en) 2021-03-18 2021-03-18 Code field identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113095064A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840742A (en) * 2023-02-13 2023-03-24 每日互动股份有限公司 Data cleaning method, device, equipment and medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181512A1 (en) * 2003-03-11 2004-09-16 Lockheed Martin Corporation System for dynamically building extended dictionaries for a data cleansing application
US20090177610A1 (en) * 2006-09-15 2009-07-09 Fujitsu Limited Information processing method and apparatus for business process analysis
DE102010035579A1 (en) * 2010-08-27 2012-03-01 Hartmut Degwert File administration system for e.g. patents, has database machine running on file server, field definition table accessed based on file type and code field and data table accessed based on file number and code field
US20180181555A1 (en) * 2016-12-27 2018-06-28 Ohio State Innovation Foundation Rewriting forms for constrained interaction
US20180260446A1 (en) * 2017-03-08 2018-09-13 Farmers Insurance Exchange System and method for building statistical predictive models using automated insights
US20180314711A1 (en) * 2015-10-30 2018-11-01 Acxiom Corporation Automated Interpretation for the Layout of Structured Multi-Field Files
CN108763952A (en) * 2018-05-03 2018-11-06 阿里巴巴集团控股有限公司 A kind of data classification method, device and electronic equipment
CN109299094A (en) * 2018-09-18 2019-02-01 深圳壹账通智能科技有限公司 Tables of data processing method, device, computer equipment and storage medium
CN110597816A (en) * 2019-09-17 2019-12-20 深圳追一科技有限公司 Data processing method, data processing device, computer equipment and computer readable storage medium
CN111125116A (en) * 2019-12-27 2020-05-08 上海德拓信息技术股份有限公司 Method and system for positioning code field in service table and corresponding code table
US10715570B1 (en) * 2018-06-25 2020-07-14 Intuit Inc. Generic event stream processing for machine learning
CN111506731A (en) * 2020-04-17 2020-08-07 支付宝(杭州)信息技术有限公司 Method, device and equipment for training field classification model
US20200380212A1 (en) * 2019-05-31 2020-12-03 Ab Initio Technology Llc Discovering a semantic meaning of data fields from profile data of the data fields

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181512A1 (en) * 2003-03-11 2004-09-16 Lockheed Martin Corporation System for dynamically building extended dictionaries for a data cleansing application
US20090177610A1 (en) * 2006-09-15 2009-07-09 Fujitsu Limited Information processing method and apparatus for business process analysis
DE102010035579A1 (en) * 2010-08-27 2012-03-01 Hartmut Degwert File administration system for e.g. patents, has database machine running on file server, field definition table accessed based on file type and code field and data table accessed based on file number and code field
US20180314711A1 (en) * 2015-10-30 2018-11-01 Acxiom Corporation Automated Interpretation for the Layout of Structured Multi-Field Files
US20180181555A1 (en) * 2016-12-27 2018-06-28 Ohio State Innovation Foundation Rewriting forms for constrained interaction
US20180260446A1 (en) * 2017-03-08 2018-09-13 Farmers Insurance Exchange System and method for building statistical predictive models using automated insights
CN108763952A (en) * 2018-05-03 2018-11-06 阿里巴巴集团控股有限公司 A kind of data classification method, device and electronic equipment
US10715570B1 (en) * 2018-06-25 2020-07-14 Intuit Inc. Generic event stream processing for machine learning
CN109299094A (en) * 2018-09-18 2019-02-01 深圳壹账通智能科技有限公司 Tables of data processing method, device, computer equipment and storage medium
US20200380212A1 (en) * 2019-05-31 2020-12-03 Ab Initio Technology Llc Discovering a semantic meaning of data fields from profile data of the data fields
CN110597816A (en) * 2019-09-17 2019-12-20 深圳追一科技有限公司 Data processing method, data processing device, computer equipment and computer readable storage medium
CN111125116A (en) * 2019-12-27 2020-05-08 上海德拓信息技术股份有限公司 Method and system for positioning code field in service table and corresponding code table
CN111506731A (en) * 2020-04-17 2020-08-07 支付宝(杭州)信息技术有限公司 Method, device and equipment for training field classification model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈晓东: "X市财政综合数据管理系统项目质量管理研究", 中国优秀硕士学位论文全文数据库 经济与管理科学辑, pages 158 - 155 *
高科;刁兴春;曹建军;: "含缺失属性值的问题数据检测与修复", 计算机工程与设计, no. 03, pages 643 - 649 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840742A (en) * 2023-02-13 2023-03-24 每日互动股份有限公司 Data cleaning method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN110659282A (en) Data route construction method and device, computer equipment and storage medium
CN111639077A (en) Data management method and device, electronic equipment and storage medium
CN111190973A (en) Method, device, equipment and storage medium for classifying statement forms
CN114860742A (en) Artificial intelligence-based AI customer service interaction method, device, equipment and medium
CN112948429B (en) Data reporting method, device and equipment
CN113095064A (en) Code field identification method and device, electronic equipment and storage medium
CN113901037A (en) Data management method, device and storage medium
CN115049446A (en) Merchant identification method and device, electronic equipment and computer readable medium
CN113138906A (en) Call chain data acquisition method, device, equipment and storage medium
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN112395881A (en) Material label construction method and device, readable storage medium and electronic equipment
CN109144999B (en) Data positioning method, device, storage medium and program product
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN115618415A (en) Sensitive data identification method and device, electronic equipment and storage medium
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN115495587A (en) Alarm analysis method and device based on knowledge graph
CN109785099B (en) Method and system for automatically processing service data information
CN114090850A (en) Log classification method, electronic device and computer-readable storage medium
CN114756685A (en) Complaint risk identification method and device for complaint sheet
CN113254612A (en) Knowledge question-answering processing method, device, equipment and storage medium
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN115203382A (en) Service problem scene identification method and device, electronic equipment and storage medium
CN112016308A (en) Language identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination