CN114861598A

CN114861598A - Annotating method, annotating device, electronic equipment and storage medium

Info

Publication number: CN114861598A
Application number: CN202210582934.5A
Authority: CN
Inventors: 雷志勇
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2022-08-05

Abstract

The application provides an annotation method, an annotation device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: reading the reference field and the reference annotation from a preset data warehouse; constructing a comment dictionary according to the reference field and the reference comment; the annotation dictionary comprises key-value pairs, and the key-value pairs comprise reference fields and reference annotations in one-to-one correspondence; reading an original field from a local library, wherein the annotation content of the original field is null; loading the annotation dictionary by taking the original field as a database key to obtain a loading result; if the loading result is successful, extracting a reference comment, and performing comment processing on the original field according to the reference comment to obtain a target field; the target field comprises field content and annotation content; and if the loading result is loading failure, performing format conversion processing on the original field to obtain a middle field, and performing annotation completion on the middle field to obtain a target field. The annotation efficiency can be improved.

Description

Annotating method, annotating device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an annotation method, an annotation apparatus, an electronic device, and a storage medium.

Background

At present, when field annotation is performed on a data table, a manual annotation mode is often adopted, and this mode often needs to write script data manually to perform annotation supplement on an original field lacking annotation, so that workload is large, and annotation efficiency is affected, and therefore how to improve annotation efficiency becomes a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the present application mainly aims to provide an annotation method, an annotation device, an electronic device, and a storage medium, which aim to improve annotation efficiency.

To achieve the above object, a first aspect of an embodiment of the present application provides an annotation method, including:

reading the reference field and the reference annotation from a preset data warehouse;

constructing a comment dictionary according to the reference field and the reference comment; wherein the annotation dictionary comprises at least one key-value pair comprising the reference field and the reference annotation in a one-to-one correspondence;

reading the original field from the local library; wherein the annotation content of the original field is null;

loading the annotation dictionary by taking the original field as a database key to obtain a loading result;

if the loading result is that the loading is successful, extracting the reference annotation, and performing annotation processing on the original field according to the reference annotation to obtain a target field; wherein the target field comprises field content and annotation content, the field content being derived from the original field, the annotation content being derived from the reference annotation;

and if the loading result is loading failure, performing format conversion processing on the original field to obtain an intermediate field, and performing annotation completion on the intermediate field to obtain a target field.

In some embodiments, said step of constructing a comment dictionary from said reference fields and said reference comments comprises:

acquiring a mapping relation between the reference field and the reference annotation;

combining the reference field and the reference annotation according to the mapping relation to obtain a key value pair;

and obtaining the annotation dictionary according to the key value pair.

In some embodiments, the loading result includes a loading success or a loading failure, and the step of loading the annotation dictionary with the original field as a database key to obtain the loading result includes:

similarity calculation is carried out on the database key and the key value pair of the annotation dictionary, and field similarity is obtained;

if the field similarity is greater than or equal to a preset similarity threshold, the loading result is that the loading is successful;

and if the field similarity is smaller than a preset similarity threshold, the loading result is loading failure.

In some embodiments, the step of performing annotation processing on the original field according to the reference annotation to obtain a target field includes:

performing language conversion processing on the reference annotation to obtain a first annotation statement;

and performing annotation processing on the original field according to the first annotation statement to obtain the target field.

In some embodiments, the step of performing language conversion processing on the reference annotation to obtain a first annotation statement includes:

splicing the reference annotations according to a preset splicing sequence to obtain a first annotation text;

and performing language conversion processing on the first annotation text to obtain the first annotation statement, wherein the first annotation statement is a data definition language.

In some embodiments, if the loading result is a loading failure, performing format conversion processing on the original field to obtain an intermediate field, and performing annotation completion on the intermediate field to obtain a target field, includes:

if the loading result is loading failure, carrying out format conversion on the original field according to a preset field format to obtain the intermediate field;

performing semantic completion on the intermediate field according to a preset semantic logic condition to obtain a second comment statement;

and performing annotation supplement on the original field according to the second annotation statement to obtain the target field.

In some embodiments, the step of performing semantic completion on the intermediate field according to a preset semantic logic condition to obtain a second comment statement includes:

storing the intermediate field into a preset configuration table;

performing semantic completion on the intermediate field through the configuration table and the semantic logic condition to obtain a second annotation text;

and performing language conversion processing on the second annotation text to obtain the second annotation statement, wherein the second annotation statement is a data definition language.

To achieve the above object, a second aspect of embodiments of the present application provides an annotation apparatus, including:

the first reading module is used for reading the reference field and the reference annotation from a preset data warehouse;

the dictionary construction module is used for constructing a comment dictionary according to the reference field and the reference comment; wherein the annotation dictionary comprises at least one key-value pair comprising the reference field and the reference annotation in a one-to-one correspondence;

a second reading module for reading the original field from the local library; wherein the annotation content of the original field is null;

the loading module is used for loading the annotation dictionary by taking the original field as a database key to obtain a loading result;

the comment processing module is used for extracting the reference comment if the loading result is that the loading is successful, and carrying out comment processing on the original field according to the reference comment to obtain a target field; wherein the target field comprises field content and annotation content, the field content being derived from the original field, the annotation content being derived from the reference annotation;

and the comment completion module is used for performing format conversion processing on the original field to obtain an intermediate field and performing comment completion on the intermediate field to obtain a target field if the loading result is loading failure.

In order to achieve the above object, a third aspect of the embodiments of the present application provides an electronic device, which includes a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, wherein the program, when executed by the processor, implements the method of the first aspect.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium for computer-readable storage, and stores one or more programs, which are executable by one or more processors to implement the method of the first aspect.

According to the annotation method, the annotation device, the electronic equipment and the storage medium, the reference field and the reference annotation are read from the preset data warehouse, and the annotation dictionary is constructed according to the reference field and the reference annotation; the annotation dictionary comprises at least one key value pair, the key value pair comprises one-to-one corresponding reference field and reference annotation, and the annotation dictionary for annotation completion can be conveniently constructed so as to annotate the field according to the annotation dictionary and improve annotation accuracy. Further, reading the original field from the local library; wherein, the annotation content of the original field is null; the method comprises the steps that an original field is used as a database key to load a comment dictionary to obtain a loading result, whether a reference comment which can be matched with the original field exists in the comment dictionary can be conveniently found in the mode, specifically, if the loading result is successful, the reference comment is extracted, and the original field is annotated according to the reference comment to obtain a target field; the target field comprises field content and annotation content, wherein the field content is derived from the original field, and the annotation content is derived from the reference annotation; and if the loading result is loading failure, performing format conversion processing on the original field to obtain a middle field, and performing annotation completion on the middle field to obtain a target field. According to the method, the original field can be annotated according to the annotation dictionary, and meanwhile, the annotation completion can be carried out on the original field without reference annotation in the annotation dictionary, so that the comprehensiveness and the annotation efficiency of the annotation are improved.

Drawings

Fig. 1 is a flowchart of an annotation method provided by an embodiment of the present application;

FIG. 2 is a flowchart of step S102 in FIG. 1;

FIG. 3 is a flowchart of step S104 in FIG. 1;

fig. 4 is a flowchart of step S105 in fig. 1;

fig. 5 is a flowchart of step S401 in fig. 4;

FIG. 6 is a flowchart of step S106 in FIG. 1;

fig. 7 is a flowchart of step S602 in fig. 6;

FIG. 8 is a schematic structural diagram of an annotation device provided in an embodiment of the present application;

fig. 9 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.

First, several terms referred to in the present application are resolved:

artificial Intelligence (AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, and research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

Natural Language Processing (NLP): NLP uses computer to process, understand and use human language (such as chinese, english, etc.), and belongs to a branch of artificial intelligence, which is a cross discipline between computer science and linguistics, also commonly called computational linguistics. Natural language processing includes parsing, semantic analysis, discourse understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, character recognition of handwriting and print, speech recognition and text-to-speech conversion, information intention recognition, information extraction and filtering, text classification and clustering, public opinion analysis and viewpoint mining, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation and the like related to language processing.

Information Extraction (Information Extraction): and extracting the fact information of entities, relations, events and the like of specified types from the natural language text, and forming a text processing technology for outputting structured data. Information extraction is a technique for extracting specific information from text data. The text data is composed of specific units, such as sentences, paragraphs and chapters, and the text information is composed of small specific units, such as words, phrases, sentences and paragraphs or combinations of these specific units. The extraction of noun phrases, names of people, names of places, etc. in the text data is text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.

MySQL database: MySQL is a relational database management system, and the relational database stores data in different tables instead of putting all data in a large warehouse, so that the processing speed of the data can be increased, and the flexibility of data calling can be improved. The SQL language used by MySQL is the most common standardized language for accessing databases.

Index (database terminology): the data structure in the MySQL database is a data organization mode, and the data structure is also called Key (primary Key). In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory. The index provides pointers to data values stored in a specified column of the table, and then sorts these pointers according to the sorting order that you specify. The database uses the index to find a particular value and then follows the pointer to find the row containing that value. This allows SQL statements corresponding to tables to be executed faster and to quickly access specific information in the database tables.

Primary Key (Primary Key): also referred to as a primary key, is one or more fields in the table whose value is used to uniquely identify a record in the table. In a two table relationship, the primary key is used to reference a particular record in one table from the other table. The primary key is a unique key that is part of the table definition. The primary key of a table may be composed of multiple keys in common, and the columns of the primary key may not contain a null value. A primary key is a column or combination of columns whose value uniquely identifies each row in a table by which the physical integrity of the table is enforced. The main key is mainly used for associating with the external key of other tables and modifying and deleting the text record.

Data Definition Language (DDL) is a Language for describing real-world entities to be stored in a database.

Web crawlers: the web robot is a program or script that automatically captures web information according to certain rules. Other less commonly used names are ants, automatic indexing, simulation programs, or worms.

At present, when field annotation is performed on a data table, a manual annotation mode is often adopted, and this mode often needs manual writing of script data to perform annotation supplement on an original field lacking annotation, the process involves manual writing of script supplement annotation, manual field translation and manual writing of a DDL statement are also needed, workload is large, and annotation efficiency is affected, so that how to improve annotation efficiency becomes a technical problem to be solved urgently.

Based on this, the embodiment of the application provides an annotation method, an annotation device, an electronic device and a storage medium, aiming at improving annotation efficiency.

The annotation method, the annotation device, the electronic device, and the storage medium provided in the embodiments of the present application are specifically described in the following embodiments, and first, the annotation method in the embodiments of the present application is described.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment of the application provides an annotation method, and relates to the technical field of artificial intelligence. The annotation method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, or the like; the server side can be configured into an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and cloud servers for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network) and big data and artificial intelligence platforms; the software may be an application or the like that implements the annotation method, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Fig. 1 is an alternative flowchart of an annotation method provided in an embodiment of the present application, and the method in fig. 1 may include, but is not limited to, steps S101 to S106.

Step S101, reading a reference field and a reference annotation from a preset data warehouse;

step S102, constructing a comment dictionary according to the reference field and the reference comment; the annotation dictionary comprises at least one key-value pair, and the key-value pair comprises a reference field and a reference annotation which are in one-to-one correspondence;

step S103, reading original fields from the local library; wherein, the annotation content of the original field is null;

step S104, loading the annotation dictionary by taking the original field as a database key to obtain a loading result;

step S105, if the loading result is that the loading is successful, extracting a reference annotation, and performing annotation processing on the original field according to the reference annotation to obtain a target field; the target field comprises field content and annotation content, wherein the field content is derived from the original field, and the annotation content is derived from the reference annotation;

and step S106, if the loading result is loading failure, carrying out format conversion processing on the original field to obtain a middle field, and carrying out annotation completion on the middle field to obtain a target field.

In steps S101 to S106 illustrated in the embodiment of the present application, a comment dictionary is constructed according to a reference field and a reference comment by reading the reference field and the reference comment from a preset data warehouse; the annotation dictionary comprises at least one key value pair, the key value pair comprises one-to-one corresponding reference field and reference annotation, and the annotation dictionary for annotation completion can be conveniently constructed so as to annotate the field according to the annotation dictionary and improve annotation accuracy. Further, reading the original field from the local library; wherein, the annotation content of the original field is null; the method comprises the steps that an original field is used as a database key to load a comment dictionary to obtain a loading result, whether a reference comment which can be matched with the original field exists in the comment dictionary can be conveniently found in the mode, specifically, if the loading result is successful, the reference comment is extracted, and the original field is annotated according to the reference comment to obtain a target field; the target field comprises field content and annotation content, wherein the field content is derived from the original field, and the annotation content is derived from the reference annotation; and if the loading result is loading failure, performing format conversion processing on the original field to obtain a middle field, and performing annotation completion on the middle field to obtain a target field. According to the method, the original field can be annotated according to the annotation dictionary, meanwhile, the annotation completion can be carried out on the original field without reference annotation in the annotation dictionary, and the comprehensiveness and the annotation efficiency of the annotation are improved.

In step S101 of some embodiments, a preset data warehouse may be loaded in a script writing manner, so as to read the reference field and the reference annotation from the data warehouse, where the preset data warehouse may be a hive library, the hive library is a data warehouse tool commonly used in MySQL, and the preset data warehouse may also be another database, without limitation. The reference field and the reference annotation are generally character strings, the reference field is an annotated field, and the reference annotation can be a random combination of Chinese, English, numbers and the like without limitation.

Referring to fig. 2, in some embodiments, step S102 may include, but is not limited to, step S201 to step S203:

step S201, obtaining the mapping relation between the reference field and the reference comment;

step S202, combining the reference field and the reference annotation according to the mapping relation to obtain a key value pair;

and step S203, obtaining the annotation dictionary according to the key value pair.

In step S201 of some embodiments, field index information of the reference field and comment index information of the reference comment are obtained, where the field index information includes a field primary key capable of characterizing a location where the reference field is located, and the comment index information includes a comment primary key capable of characterizing a location where the reference comment is located. Therefore, the row characteristics and the column characteristics of the comment main key and the field main key are compared, if the row characteristics of the comment main key are the same as those of the field main key and the column characteristics of the comment main key are the same as those of the field main key for a certain reference field and a certain reference comment, it indicates that the reference field and the reference comment are in the same position, and a mapping relationship exists between the reference field and the reference comment, namely the reference comment is the comment of the reference field.

In step S202 of some embodiments, according to a mapping relationship between a reference field and a reference comment, combining and associating the reference field and the reference comment at the same position, taking the reference field as key data (i.e., key), taking the reference comment as value data (i.e., value), pairing each pair of key data and value data having a mapping relationship to form a key-value pair, and representing the reference field and the corresponding reference comment in the form of a key-value pair to obtain a plurality of key-value pairs, where each reference field and each corresponding reference comment form a key-value pair, and the reference field and the reference comment of each key-value pair are in one-to-one correspondence.

In step S203 of some embodiments, the obtained plurality of key-value pairs are stored in the database in the form of a data diagram map as a whole, so as to obtain an annotation dictionary.

In step S103 of some embodiments, the web crawler is encoded in advance, so that the original fields are automatically captured from the local library by the web crawler traversing all tables of the local library; the annotation content of the original field is empty, and the original field is generally represented in a character string form; the local library may be an Sx-hx-safe library, or other databases, but is not limited thereto.

Referring to fig. 3, in some embodiments, the loading result includes loading success or loading failure, and step S104 may include, but is not limited to, step S301 to step S303:

step S301, similarity calculation is carried out on key values of the database key and the annotation dictionary to obtain field similarity;

step S302, if the field similarity is greater than or equal to a preset similarity threshold, the loading result is that the loading is successful;

step S303, if the field similarity is smaller than the preset similarity threshold, the loading result is a loading failure.

In step S301 of some embodiments, key data (i.e., a reference field) in a key value pair is extracted, and similarity calculation is performed on a database key (i.e., an original field) and the key data by a preset similarity calculation method, so as to obtain field similarity. Specifically, firstly, mapping a database key, mapping the database key to a preset vector space to obtain an original key vector u, mapping key data in a key value pair, and mapping the key data to the preset vector space to obtain a reference key vector v, wherein the characteristic dimension of a preset vector space can be set according to actual business requirements without limitation. Further, the preset similarity algorithm may include a cosine similarity algorithm or the like, for example, the similarity calculation is performed on the reference key vector and the original key vector by the cosine similarity algorithm to obtain the field similarity. The calculation process can be expressed as shown in equation (1):

in step S302 of some embodiments, the preset similarity threshold may be set according to an actual service requirement, without limitation. For example, the preset similarity threshold is 0.7. If the field similarity is greater than or equal to the similarity threshold, it indicates that the database key is closer to the semantic information of the current key data, and the semantics of the original field are similar to those of the reference field of the key-value pair, so that the value data (reference annotation) corresponding to the reference field can be used as the annotation of the original field, that is, the reference annotation matching the original field exists in the annotation dictionary, and the annotation processing can be performed on the original field by referring to the annotation, and therefore, the loading result is successful.

In step S303 in some embodiments, the preset similarity threshold may be set according to an actual service requirement, without limitation. For example, the preset similarity threshold is 0.7. If the field similarity is smaller than the similarity threshold, it indicates that the difference between the database key and the semantic information of the current key data is large, and the semantic correlation between the original field and the reference field of the key-value pair is low, so that it is not suitable to use the value data (reference annotation) corresponding to the reference field as the annotation of the original field, that is, there is no reference annotation matching the original field in the annotation dictionary, and it is necessary to perform annotation processing on the original field in other ways, so the loading result is loading failure.

Through the steps S301 to S303, it is more convenient to determine whether the reference annotation matching the original field exists in the annotation dictionary, and when the reference annotation matching the original field exists in the annotation dictionary, the original field is annotated according to the annotation dictionary, and when the reference annotation matching the original field does not exist in the annotation dictionary, the annotation completion is performed in other ways, so that the annotation efficiency and the annotation comprehensiveness can be effectively improved.

Referring to fig. 4, in some embodiments, step S105 may include, but is not limited to, step S401 to step S402:

step S401, performing language conversion processing on the reference annotation to obtain a first annotation statement;

and step S402, performing annotation processing on the original field according to the first annotation statement to obtain a target field.

In step S401 in some embodiments, if there is a reference annotation matching the original field in the annotation dictionary, it indicates that the database key is closer to the semantic information of a certain reference field, and the loading result is successful, and meanwhile, the annotation dictionary is traversed, the key-value pair corresponding to the reference field is extracted, and the reference annotation is extracted from the key-value pair. In order to improve the annotation efficiency, it is necessary to perform language conversion processing on the reference annotation in the form of a character string, and convert the reference annotation from the character string to a database language to obtain a first annotated sentence.

In step S402 of some embodiments, a field definition script is first preset, an attribute addition is performed on an original field through the preset field definition script, where the added attribute is generally a commt attribute, and then a first comment statement is added to the original field through the added commt attribute, so as to implement a comment processing on the original field, and obtain a target field, where the target field includes field content and comment content, the field content is derived from the original field, and the comment content is derived from a reference comment.

Further, in order to improve the reasonability of the annotation, a labeling position corresponding to the original field is also required to be obtained, and the labeling position is used for writing annotation content corresponding to the original field, wherein the labeling position can be obtained by analyzing index information corresponding to the original field. Specifically, annotation feature extraction can be performed on the index information through a TF-IDF algorithm, each index information is processed into a plurality of character nodes, the occurrence frequency of each character in the index information is calculated through the TF-IDF algorithm, and the Term Frequency (TF) of the character is obtained, wherein TF is the number of occurrences of the character W/the number of characters in the index information; further, an Inverse Document Frequency (IDF) of each character is calculated, where IDF is log (total index information/(index information number including character w +1)), and finally, a comprehensive Frequency value of each character is calculated according to the word Frequency and the Inverse Document Frequency, where the character w is a kind of word capable of representing annotation, mark and annotation meaning, and a node with the maximum comprehensive Frequency value and including the character w is selected as an annotation feature in the index information, and the annotation feature can reflect an annotation position corresponding to an original field. Therefore, the first comment statement is added to the marking position of the original field through the commt attribute, the target field is obtained, the target field can keep the format consistency, and the normalization of the target field is improved.

Referring to fig. 5, in some embodiments, step S401 may include, but is not limited to, step S501 to step S502:

step S501, splicing the reference annotations according to a preset splicing sequence to obtain a first annotation text;

step S502, performing language conversion processing on the first annotation text to obtain a first annotation statement, where the first annotation statement is a data definition language.

In step S501 in some embodiments, the preset splicing sequence may be set according to actual service requirements, and is not limited, for example, the preset splicing sequence may be a time sequence for obtaining the reference annotations, or may be a sequence for arranging the original fields according to basic syntax rules (such as a commonly-used principal and predicate form, etc.), so as to obtain a sequence number of each original field, perform sorting processing on the corresponding reference annotations with the same sequence number according to the sequence number of the original field, and perform splicing processing on the sorted reference annotations, so as to obtain the first annotation text.

In step S502 of some embodiments, in order to improve the reasonableness of the annotation and enable the annotation form to meet the business requirement, a language conversion process needs to be performed on the first annotation text, and the first annotation text is converted from a text string to a database language to obtain a first annotation statement. The database language is a DDL statement in an SQL language.

It should be noted that, when performing language conversion on the first annotation Text, the language conversion may be implemented by referring to a processing procedure of a conventional Text-to-SQL task, for example, learning potential knowledge of the first annotation Text through a deep learning model, predicting a relationship between Text sentences in the first annotation Text, so as to generate a first annotation statement, where the first annotation statement is a DDL statement, and the DDL statement is a data definition language commonly used in a MYSQL database, and performing annotation processing on an original field in the form of the DDL statement can conveniently implement storage of annotation content, thereby improving annotation efficiency and annotation accuracy.

Referring to fig. 6, in some embodiments, step S106 includes, but is not limited to, steps S601 to S603:

step S601, if the loading result is loading failure, format conversion is carried out on an original field according to a preset field format to obtain an intermediate field;

step S602, performing semantic completion on the intermediate field according to a preset semantic logic condition to obtain a second comment statement;

and step S603, performing annotation supplement on the original field according to the second annotation statement to obtain the target field.

In step S601 in some embodiments, if there is no reference annotation matching the original field in the annotation dictionary, it indicates that the difference between the database key and the semantic information of a certain reference field is large, and it is not suitable for using the reference annotation in the annotation dictionary as the annotation of the original field, and the loading result is a failure, and it is necessary to perform annotation processing on the original field in another manner. In order to improve the processing efficiency of original fields without corresponding reference comments matching in the comment dictionary, the original fields can be marked, specifically, the field names of the original fields are subjected to format conversion according to a preset field format to obtain intermediate fields, wherein the field names of the intermediate fields can be expressed in a form of 'field names of the original fields + random numbers + autocoommets', so that the format of the intermediate fields can be different from that of the original fields, and the original fields without corresponding reference comments matching in the comment dictionary can be screened out from a local library.

In step S602 in some embodiments, a configuration table may be constructed based on all fields of the local library and their corresponding annotation contents, and a query is performed by traversing the configuration table to determine whether there is a field that matches the semantic feature of the intermediate field, and if there is a field that matches the semantic feature, the annotation content corresponding to the field in the configuration table is extracted according to a preset semantic logic condition, and the intermediate field is semantically complemented according to the annotation content to obtain a second annotation statement.

In step S603 of some embodiments, a field definition script is first preset, an attribute is added to the original field through the preset field definition script, where the added attribute is generally a commt attribute, and then a second comment statement is added to the original field through the added commt attribute, so as to implement comment processing on the original field, and obtain the target field.

Referring to fig. 7, in some embodiments, step S602 may include, but is not limited to, steps S701 to S703:

step S701, storing the middle field into a preset configuration table;

step S702, performing semantic completion on the intermediate field through a configuration table and semantic logic conditions to obtain a second annotation text;

step S703, performing language conversion processing on the second annotation text to obtain a second annotation statement, where the second annotation statement is a data definition language.

In step S701 of some embodiments, the intermediate field is stored in a preset configuration table, where the configuration table includes a plurality of comment fields, each comment field includes a content having a table name of a data table of the local library, and a field in the data table and a comment content corresponding to the field; that is, the configuration table includes all the fields of the local library and their corresponding annotation contents, and it is understood that the configuration table can be regarded as a dictionary constructed based on the fields of the local library and their corresponding annotation contents.

In step S702 of some embodiments, the configuration table includes a configuration switch, which is typically a dynamic switch, and the configuration switch can selectively enable or disable the configuration table according to actual service requirements. For an original field which is not matched with a corresponding reference comment in a comment dictionary, after the original field is subjected to format conversion to obtain an intermediate field, a configuration switch needs to be loaded to be in a starting state, so that a configuration table can be started, the comment field in the configuration table is traversed and matched according to the intermediate field, the comment field with higher similarity to the intermediate field is found, the comment content in the comment field is extracted, the comment content is used as a comment of the intermediate field, the comment operation can be conveniently performed on the intermediate field through the method, the mutual completion among data tables in a local library can be realized, namely, the comment supplement is performed on the field of another data table according to the comment content of the field and the field of a certain data table, and the comment comprehensiveness is improved.

For example, when the field a of the data table a of the local library does not have a corresponding reference comment in the comment dictionary, format conversion is performed on the field a to obtain the intermediate field a1, and the configuration switch of the configuration table is loaded to be in an activated state, and the comment field 1 which is relatively similar to the intermediate field a1 is searched in the configuration table, where the content contained in the comment field 1 is the table name of the data table B of the local library, and the comment contents corresponding to the field B and the field B in the data table B indicate that the field a of the data table a and the field B in the data table B may be fields representing the same semantic information, so that the comment content corresponding to the field B may be the comment content of the field a.

Further, since the extracted annotation content in the configuration table may be discrete and is not continuous annotation content conforming to the semantic specification, the annotation content needs to be spliced according to semantic logic conditions (for example, in the form of a principal and a predicate object) to obtain a second annotation text.

In step S703 of some embodiments, in order to improve the reasonableness of the annotation and enable the annotation form to meet the business requirement, a language conversion process needs to be performed on the second annotation text, and the second annotation text is converted from the text string to the database language to obtain a second annotation statement. The database language is a DDL statement in an SQL language, the DDL statement is a data definition language commonly used in a MYSQL database, the original field is annotated in a DDL statement form, and annotation content can be conveniently stored, so that annotation efficiency and annotation accuracy are improved.

In addition, the annotation method of the embodiment of the application can also perform annotation correction on the original field with the annotation content in the local library to obtain the target field. Specifically, firstly, the qualification of the annotation content of the original field is checked, whether the annotation content of the original field is a messy code or a blank space is checked, and if the annotation content is a normal character string, the annotation content of the original field is defaulted to be correct. If the comment content of the original field is a messy code or a space, the original field is re-annotated according to the comment process of the above steps S104, S105, and S106, and the target field is obtained.

According to the annotation method, the reference field and the reference annotation are read from the preset data warehouse, and the annotation dictionary is constructed according to the reference field and the reference annotation; the annotation dictionary comprises at least one key value pair, the key value pair comprises one-to-one corresponding reference field and reference annotation, and the annotation dictionary for annotation completion can be conveniently constructed so as to annotate the field according to the annotation dictionary and improve annotation accuracy. Further, reading the original field from the local library; wherein, the annotation content of the original field is null; the method comprises the steps that an original field is used as a database key to load a comment dictionary to obtain a loading result, whether a reference comment which can be matched with the original field exists in the comment dictionary can be conveniently found in the mode, specifically, if the loading result is successful, the reference comment is extracted, and the original field is annotated according to the reference comment to obtain a target field; the target field comprises field content and annotation content, wherein the field content is derived from the original field, and the annotation content is derived from the reference annotation; and if the loading result is loading failure, performing format conversion processing on the original field to obtain a middle field, and performing annotation completion on the middle field to obtain a target field. According to the method, the original field can be annotated according to the annotation dictionary, and meanwhile, the annotation completion can be carried out on the original field without reference annotation in the annotation dictionary, so that the comprehensiveness and the annotation efficiency of the annotation are improved.

Referring to fig. 8, an embodiment of the present application further provides an annotation apparatus, which can implement the annotation method described above, where the apparatus includes:

a first reading module 801, configured to read a reference field and a reference comment from a preset data warehouse;

a dictionary construction module 802 for constructing a comment dictionary based on the reference field and the reference comment; the annotation dictionary comprises at least one key-value pair, and the key-value pair comprises a reference field and a reference annotation which are in one-to-one correspondence;

a second reading module 803, for reading the original field from the local library; wherein, the annotation content of the original field is null;

the loading module 804 is configured to load the annotation dictionary by using the original field as a database key to obtain a loading result;

the comment processing module 805 is configured to extract a reference comment if the loading result is that the loading is successful, and perform comment processing on the original field according to the reference comment to obtain a target field; the target field comprises field content and annotation content, wherein the field content is derived from the original field, and the annotation content is derived from the reference annotation;

and a comment completion module 806, configured to perform format conversion processing on the original field to obtain an intermediate field if the loading result is that the loading fails, and perform comment completion on the intermediate field to obtain a target field.

In some embodiments, dictionary construction module 802 includes:

the mapping unit is used for acquiring the mapping relation between the reference field and the reference annotation;

the combination unit is used for combining the reference field and the reference annotation according to the mapping relation to obtain a key value pair;

and the dictionary generating unit is used for obtaining the annotated dictionary according to the key value pairs.

In some embodiments, the load result includes a load success or a load failure, and the load module 804 includes:

the similarity calculation unit is used for calculating the similarity of the database key and the key value pair of the annotation dictionary to obtain the field similarity;

the first loading unit is used for judging that the loading result is successful if the field similarity is greater than or equal to a preset similarity threshold;

and the second loading unit is used for determining that the loading result is loading failure if the field similarity is smaller than a preset similarity threshold.

In some embodiments, annotation processing module 805 includes:

the language conversion unit is used for carrying out language conversion processing on the reference annotation to obtain a first annotation statement;

and the annotation unit is used for performing annotation processing on the original field according to the first annotation statement to obtain the target field.

In some embodiments, the language conversion unit includes:

the splicing subunit is used for splicing the reference annotations according to a preset splicing sequence to obtain a first annotation text;

and the first language conversion subunit is used for performing language conversion processing on the first annotation text to obtain a first annotation statement, wherein the first annotation statement is a DDL statement.

In some embodiments, the annotation completion module 806 includes:

the format conversion unit is used for carrying out format conversion on the original field according to a preset field format to obtain an intermediate field if the loading result is that the loading fails;

the semantic completion unit is used for performing semantic completion on the intermediate field according to a preset semantic logic condition to obtain a second comment statement;

and the comment supplementing unit is used for performing comment supplementation on the original field according to the second comment statement to obtain the target field.

In some embodiments, the semantic completion unit comprises:

the storage subunit is used for storing the intermediate field into a preset configuration table;

the completion subunit is used for performing semantic completion on the intermediate field through the configuration table and the semantic logic conditions to obtain a second annotation text;

and the second language conversion subunit is used for performing language conversion processing on the second annotation text to obtain a second annotation statement, wherein the second annotation statement is a DDL statement.

The specific implementation of the annotation device is substantially the same as the specific implementation of the annotation method, and is not described herein again.

An embodiment of the present application further provides an electronic device, where the electronic device includes: a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling a connection communication between the processor and the memory, the program, when executed by the processor, implementing the above-mentioned annotation method. The electronic equipment can be any intelligent terminal including a tablet computer, a vehicle-mounted computer and the like.

Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, where the electronic device includes:

the processor 901 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute a relevant program to implement the technical solution provided in the embodiment of the present application;

the memory 902 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a Random Access Memory (RAM). The memory 902 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 902 and called by the processor 901 to execute the annotation method of the embodiments of the present application;

an input/output interface 903 for implementing information input and output;

a communication interface 904, configured to implement communication interaction between the device and another device, where communication may be implemented in a wired manner (e.g., USB, network cable, etc.), or in a wireless manner (e.g., mobile network, WIFI, bluetooth, etc.);

a bus 905 that transfers information between various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);

wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 enable a communication connection within the device with each other through a bus 905.

Embodiments of the present application further provide a storage medium, which is a computer-readable storage medium for computer-readable storage, and the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the above annotation method.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

According to the annotation method, the annotation device, the electronic equipment and the storage medium, the reference field and the reference annotation are read from the preset data warehouse, and the annotation dictionary is constructed according to the reference field and the reference annotation; the annotation dictionary comprises at least one key value pair, the key value pair comprises one-to-one corresponding reference field and reference annotation, and the annotation dictionary for annotation completion can be conveniently constructed so as to annotate the field according to the annotation dictionary and improve annotation accuracy. Further, reading the original field from the local library; wherein, the annotation content of the original field is null; the method comprises the steps that an original field is used as a database key to load a comment dictionary to obtain a loading result, whether a reference comment which can be matched with the original field exists in the comment dictionary can be conveniently found in the mode, specifically, if the loading result is successful, the reference comment is extracted, and the original field is annotated according to the reference comment to obtain a target field; the target field comprises field content and annotation content, wherein the field content is derived from the original field, and the annotation content is derived from the reference annotation; and if the loading result is loading failure, performing format conversion processing on the original field to obtain a middle field, and performing annotation completion on the middle field to obtain a target field. According to the method, the original field can be annotated according to the annotation dictionary, meanwhile, the annotation completion can be carried out on the original field without reference annotation in the annotation dictionary, and the comprehensiveness and the annotation efficiency of the annotation are improved.

The embodiments described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation to the technical solutions provided in the embodiments of the present application, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not intended to limit the embodiments of the present application and may include more or fewer steps than those shown, or some of the steps may be combined, or different steps may be included.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and the scope of the claims of the embodiments of the present application is not limited thereto. Any modifications, equivalents and improvements that may occur to those skilled in the art without departing from the scope and spirit of the embodiments of the present application are intended to be within the scope of the claims of the embodiments of the present application.

Claims

1. A method of annotation, the method comprising:

2. The annotation process of claim 1, wherein said step of constructing an annotation dictionary based on said reference fields and said reference annotations comprises:

and obtaining the annotation dictionary according to the key value pair.

3. The annotation method according to claim 1, wherein the loading result includes a loading success or a loading failure, and the step of loading the annotation dictionary using the original field as a database key to obtain the loading result includes:

performing similarity calculation on the database key and the key value pair of the annotation dictionary to obtain field similarity;

4. The annotation method of claim 1, wherein said step of annotating said original field with said reference annotation to obtain a target field comprises:

5. The annotation process of claim 4, wherein said step of performing language conversion processing on said reference annotation to obtain a first annotation statement comprises:

6. The annotation method according to any one of claims 1 to 5, wherein the step of performing format conversion processing on the original field to obtain an intermediate field and performing annotation completion on the intermediate field to obtain a target field if the loading result is a loading failure includes:

7. The annotation method of claim 6, wherein said step of semantically completing said intermediate field according to a predetermined semantic logic condition to obtain a second annotation statement comprises:

storing the intermediate field into a preset configuration table;

8. An annotation apparatus, characterized in that the apparatus comprises:

9. An electronic device, characterized in that it comprises a memory, a processor, a program stored on said memory and executable on said processor, and a data bus for implementing a connection communication between said processor and said memory, said program, when executed by said processor, implementing the steps of the annotation method according to any one of claims 1 to 7.

10. A storage medium, which is a computer-readable storage medium, for computer-readable storage, characterized in that the storage medium stores one or more programs, which are executable by one or more processors, to implement the steps of the annotation method of any one of claims 1 to 7.