CN113722206A

CN113722206A - Data annotation method and device, electronic equipment and computer readable medium

Info

Publication number: CN113722206A
Application number: CN202011261442.3A
Authority: CN
Inventors: 连玺; 张江涛; 户作鹏; 刘诏; 陶希; 申作军; 陈志良
Original assignee: Beijing Jingdong Shangke Information Technology Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Shangke Information Technology Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-11-30

Abstract

The embodiment of the disclosure discloses a data annotation method, a data annotation device, electronic equipment and a computer readable medium. One specific implementation of the data annotation method comprises the following steps: determining characteristic information corresponding to target data; matching each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case, wherein the test case comprises: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data; and carrying out case annotation on the target data according to the at least one test case. The implementation method can accurately and effectively realize the matching between the target data and the test cases in the test case set, and further improves the stability and reliability of the marked data.

Description

Data annotation method and device, electronic equipment and computer readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a data annotation method, a data annotation device, electronic equipment and a computer readable medium.

Background

A Test Case (Test Case) refers to a description of a Test task performed on a specific software product, and may be embodied as a Test scheme, method, technique, and strategy. The test content includes a test target, a test environment, input data, test steps, expected results, test scripts, and the like. Currently, for test cases, the selection of test data is often created manually by a technician involved.

However, when the test data is acquired in the above manner, there are often technical problems as follows:

firstly, the process of creating test data is too cumbersome, and the created test data cannot ensure the stability and reliability of the test data. In addition, as the business or peripheral dependent system changes, many invalid test data may occur.

Secondly, the test data corresponding to the test case cannot be effectively updated and replaced.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose data annotation methods, apparatuses, devices and computer readable media to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a data annotation method, including: determining characteristic information corresponding to target data; matching each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case, wherein the test case comprises: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data; and carrying out case marking on the target data according to the at least one test case.

Optionally, before determining the feature information corresponding to the target data, the method further includes: acquiring a data set of interfaces corresponding to each service in a service system within a preset period of time; and screening the data set to obtain a screened data set comprising the target data.

Optionally, the method further includes: performing data desensitization on the target data marked by the use case to obtain desensitized data; and carrying out data persistence on the desensitized data.

Optionally, the obtaining of the data set of the interface corresponding to each service in the service system within the predetermined period of time includes: and intercepting and monitoring interfaces corresponding to each service in the service system within the preset period of time to acquire the data set.

Optionally, the data persistence processing on the desensitized data includes: determining the quantity of data stored in each test case in the at least one test case; responding to the data quantity larger than or equal to a preset threshold value, and determining whether a user-defined persistence method exists or not; and responding to the existence of the user-defined persistence method, and storing the desensitized data into the test case by using the user-defined persistence method.

Optionally, the data persistence processing on the desensitized data includes: and in response to the absence of the user-defined persistence method, storing the desensitized data into the test case according to a persistence method set by default of a system.

Optionally, the test case set is generated through the following steps: determining a characteristic information set corresponding to each service in the service system; combining the characteristic information in the characteristic information set corresponding to each service to obtain each characteristic expression; determining each script corresponding to each characteristic expression according to each characteristic expression; and generating the test case set according to the characteristic expressions and the scripts.

In a second aspect, some embodiments of the present disclosure provide a data annotation apparatus, the apparatus comprising: the determining unit is configured to determine characteristic information corresponding to the target data; a matching unit configured to match each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case, where the test case includes: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data; and the case labeling unit is configured to perform case labeling on the target data according to the at least one test case.

Optionally, the apparatus further comprises: acquiring a data set of interfaces corresponding to each service in a service system within a preset period of time; and screening the data set to obtain a screened data set comprising the target data.

Optionally, the apparatus further comprises: performing data desensitization on the target data marked by the use case to obtain desensitized data; and carrying out data persistence on the desensitized data.

Optionally, the apparatus further comprises: and intercepting and monitoring interfaces corresponding to each service in the service system within the preset period of time to acquire the data set.

Optionally, the apparatus further comprises: determining the quantity of data stored in each test case in the at least one test case; responding to the data quantity larger than or equal to a preset threshold value, and determining whether a user-defined persistence method exists or not; and responding to the existence of the user-defined persistence method, and storing the desensitized data into the test case by using the user-defined persistence method.

Optionally, in response to that the user-defined persistence method does not exist, storing the desensitized data into the test case according to a persistence method set by a system default.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any one of the first aspect.

The above embodiments of the present disclosure have the following beneficial effects: by the data labeling method of some embodiments of the present disclosure, matching between the target data and the test cases in the test case set can be accurately and effectively achieved, and further, stability and reliability of the labeled test data are improved. In particular, the inventor found that the process of creating test data is too cumbersome and the created test data cannot guarantee its stability and reliability. In addition, many invalid test data may occur as a result of changes in the business or peripheral dependent system. Based on this, the data annotation method according to some embodiments of the present disclosure may first determine feature information corresponding to the target data for subsequent matching with each feature expression. And then, matching each feature expression with the feature information by using each script in the test case set to obtain at least one test case. Through the comparison between the characteristic expression and the characteristic information, the accuracy of the marked data can be greatly improved, so that the marked data has stronger stability and reliability. In addition, the occurrence of invalid test data is reduced. And finally, carrying out case labeling on the target data to obtain the incidence relation between the target data and the at least one test case.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

1-2 are schematic diagrams of an application scenario of the data annotation method of some embodiments of the present disclosure;

FIG. 3 is a flow diagram of some embodiments of a data annotation process according to the present disclosure;

FIG. 4 is a schematic illustration of characteristic information of a data annotation process according to some embodiments of the present disclosure;

FIG. 5 is a schematic illustration of feature expression generation in test cases of a data annotation process according to some embodiments of the present disclosure;

FIG. 6 is a flow diagram of further embodiments of a data annotation process according to the present disclosure;

FIG. 7 is a schematic block diagram of some embodiments of a data annotation device according to the present disclosure;

FIG. 8 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1-2 are schematic diagrams of application scenarios of the data annotation method according to some embodiments of the present disclosure.

As shown in fig. 1-2, the electronic device 101 may first determine characteristic information 106 corresponding to the target data 105. In this application scenario, the target data 105 may be: "Xiaohong places a 500 yuan vegetable order in the application of Yuyu fresh supermarket. The characteristic information 106 may be: "name: small red, local: the order type of the Yuyu fresh supermarket application is as follows: vegetable order, price: 500 yuan. Then, each feature expression 103 in the test case set 102 is matched with the feature information 106 by using each script 104 in the test case set 102, so as to obtain the at least one test case. Wherein, the test case comprises: the script is used for realizing matching between the feature expression and the feature information 106, and the feature expression represents a matching relationship between the test case and the target data 105. In this application scenario, the test case set 102 includes: test case 1021, test case 1022, test case 1023, test case 1024, test case 1025, test case 1026, test case 1027, test case 1028. Each of the above feature expressions 103 includes: feature expression 1031 in test case 1021, feature expression 1032 in test case 1022, feature expression 1033 in test case 1023, feature expression 1034 in test case 1024, feature expression 1035 in test case 1025, feature expression 1036 in test case 1026, feature expression 1037 in test case 1027, and feature expression 1038 in test case 1028. Each of the scripts 104 includes: script 1041 in test case 1021, script 1042 in test case 1022, script 1043 in test case 1023, script 1044 in test case 1024, script 1045 in test case 1025, script 1046 in test case 1026, script 1047 in test case 1027, and script 1048 in test case 1028. The at least one test case comprises: test case 1041, test case 1042, test case 1043, test case 1044. Finally, according to the at least one test case, case labeling is performed on the target data 105.

The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware devices. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices, as desired for implementation.

With continued reference to FIG. 3, a flow 300 of some embodiments of a data annotation process according to the present disclosure is shown. The data annotation method comprises the following steps:

step 301, determining characteristic information corresponding to the target data.

In some embodiments, an execution subject of the data annotation method (e.g., the data culling thread 101 shown in fig. 1) may determine feature information corresponding to the target data. The feature information may include a feature type and a feature content associated with the service. As an example, the service is an order placing service, the feature type includes an order type, and the feature content corresponding to the order type may be a vegetable order. As another example, the service is a sales service, the feature type includes a sales object, and the feature content corresponding to the sales object is a worker.

It should be noted that a service may correspond to a plurality of feature types, and each feature type may correspond to a plurality of feature contents. One feature content may correspond to a plurality of feature types.

As an example, as shown in fig. 4, the service a corresponds to a plurality of feature types, and the plurality of feature types are: feature type a, feature type b, feature type c. Thus, a feature type may correspond to a plurality of feature contents. As shown, feature type a corresponds to feature content 1 and feature content 4. The feature type b corresponds to the feature content 1, the feature content 2, and the feature content 5. The above feature type c corresponds to the feature content 1, the feature content 2, and the feature content 3. As can be seen from fig. 4, the feature content 1 corresponds to a feature type a, a feature type b, and a feature type c. Therefore, one feature content may correspond to a plurality of feature types.

In some optional implementations of some embodiments, before determining the feature information corresponding to the target data, the step further includes:

the first step is to obtain a data set of interfaces corresponding to each service in a service system within a preset period of time. The service system may include a plurality of different types of services.

And secondly, screening the data set to obtain a screened data set comprising the target data. As an example, the data set may be subjected to data filtering through a filter built in the system or a filter customized by a user, so as to obtain a filtered data set including the target data. Wherein, the filter built in the system can include but is not limited to at least one of the following: percentage filters, interface method filters, Remote Procedure Call (RPC) packet filters, traffic exception filters.

Optionally, the interfaces corresponding to the services in the service system within the predetermined period of time are intercepted and monitored to obtain the data set. As an example, intercepting and monitoring may be performed on each service corresponding interface in the target time period service system through an Aspect Oriented Programming (AOP) technique to obtain the data set. As another example, the data set may be obtained by intercepting and monitoring interfaces corresponding to each service in the target time period service system through a Java Agent mechanism.

In some optional implementations of some embodiments, the test case set is generated by:

firstly, determining a characteristic information set corresponding to each service in the service system.

And secondly, combining the characteristic information in the characteristic information sets corresponding to the services to obtain the characteristic expressions.

And thirdly, determining each script corresponding to each characteristic expression. As an example, the related art person may compile respective scripts corresponding to the respective feature expressions through the feature expression set.

And fourthly, generating the test case set according to the characteristic expressions and the scripts. As an example, the feature expressions are in one-to-one correspondence with the scripts to obtain a test case set.

As an example, as shown in fig. 5, the feature information corresponding to the service a includes: feature information 1, feature information 2, and feature information 3. And combining the characteristic information 1 and the characteristic information 2 to obtain a characteristic expression a in the test case a. And combining the characteristic information 1, the characteristic information 2 and the characteristic information 3 to obtain a characteristic expression c in the test case c. And combining the characteristic information 2 and the characteristic information 3 to obtain a characteristic expression b in the test case b. And combining the characteristic information 1 and the characteristic information 3 to obtain a characteristic expression d in the test case d.

And 302, matching each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case.

In some embodiments, the execution subject may match each feature expression in the test case set with the feature information by using each script in the test case set, so as to obtain at least one test case. Wherein, the test case comprises: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data. The scripts may be edited by a skilled person or a machine.

Step 303, performing case labeling on the target data according to the at least one test case.

In some embodiments, the execution subject may perform case labeling on the target data according to the at least one test case. Here, the data after case tagging may be used as test data for the at least one test case.

The above embodiments of the present disclosure have the following beneficial effects: by the data labeling method of some embodiments of the present disclosure, matching between the target data and the test cases in the test case set can be accurately and effectively achieved, and further, stability and reliability of the labeled test data are improved. In particular, the inventor found that the process of creating test data is too cumbersome and the created test data cannot guarantee its stability and reliability. In addition, many invalid test data may occur as a result of changes in the business or peripheral dependent system. Based on this, the data annotation method according to some embodiments of the present disclosure may first determine feature information corresponding to the target data for subsequent matching with each feature expression. And then, matching each feature expression with the feature information by using each script in the test case set to obtain the at least one test case. Through the comparison between the characteristic expression and the characteristic information, the accuracy of the marked data can be greatly improved, so that the marked data has stronger stability and reliability. The occurrence of invalid test data is reduced. And finally, carrying out case labeling on the target data to obtain the incidence relation between the target data and the at least one test case.

With continued reference to FIG. 6, a flow 600 of further embodiments of a data annotation process according to the present disclosure is shown. The data annotation method comprises the following steps:

step 601, determining characteristic information corresponding to the target data.

And step 602, matching each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case.

Step 603, according to the at least one test case, performing case labeling on the target data.

In some embodiments, the specific implementation and technical effects of steps 601-603 may refer to steps 301-303 in those embodiments corresponding to fig. 3, which are not described herein again.

And step 604, performing data desensitization on the target data marked by the use case to obtain desensitized data.

In some embodiments, an executing agent (e.g., the electronics 101 shown in FIG. 1) may perform data desensitization on the annotated target data with the use case, resulting in desensitized data. Therein, the desensitization information may be sensitive content related to the service.

Step 605, determining the quantity of data stored in each test case in the at least one test case.

In some embodiments, the execution subject may determine the amount of data stored in each of the at least one test case.

Step 606, responding to the data quantity being greater than or equal to the preset threshold value, determining whether a user-defined persistence method exists.

In some embodiments, the execution principal may determine whether a user-defined persistence method exists in response to the amount of data being greater than or equal to a predetermined threshold. The user-defined persistence method may be a data storage method determined by the user. By way of example, in response to the amount of data being greater than or equal to a predetermined threshold, a person of ordinary skill in the relevant art can determine by way of a search whether a user-defined persistence method exists.

Step 607, in response to the existence of the user-defined persistence method, storing the desensitized data into the test case by using the user-defined persistence method.

In some embodiments, in response to the presence of the user-defined persistence method, the execution agent may store the desensitized data in the test case using the user-defined persistence method.

One inventive aspect of the embodiments of the present disclosure solves a second technical problem mentioned in the background art that "the test data corresponding to the test case cannot be effectively updated and replaced". The embodiment of the disclosure performs case marking and persisting on the data of the preset period time to the corresponding test case. The purpose of this operation is: the test data availability corresponding to the test case is poor. Most test data are often not updated in time, and the test data are relatively old and lack real-time performance and effectiveness. The performability of the previous test case cannot be guaranteed before the regression test. Therefore, the screened data in the preset period time can be persisted through the user-defined persistence method and the persistence method default set by the system, so as to ensure the performability of each test case.

With continued reference to fig. 7, as an implementation of the above-described method for the above-described figures, the present disclosure provides some embodiments of a data annotation device, which correspond to those of the method embodiments described above in fig. 3, and which can be applied to various electronic devices.

As shown in fig. 7, the data annotation device 700 of some embodiments includes: a determination unit 701, a matching unit 702 and a use case labeling unit 703. The determining unit 701 is configured to determine feature information corresponding to the target data. A matching unit 702, configured to match, by using each script in a test case set, each feature expression in the test case set with the feature information to obtain at least one test case, where the test case includes: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data. The use case labeling unit 703 is configured to perform use case labeling on the target data according to the at least one test case.

In some optional implementations of some embodiments, the apparatus further includes: an acquisition unit and a screening unit (not shown in the figure). Wherein the obtaining unit may be further configured to: and acquiring a data set of each service corresponding interface in the service system within a preset period of time. The screening unit may be further configured to: and screening the data set to obtain a screened data set comprising the target data.

In some optional implementations of some embodiments, the apparatus further includes: a data desensitization unit and a data persistence unit (not shown in the figure). Wherein the data desensitization unit may be further configured to: and performing data desensitization on the target data marked by the use case to obtain desensitized data. The data persistence unit is further configured to: and carrying out data persistence on the desensitized data.

In some optional implementations of some embodiments, the data persistence unit may be further configured to: determining the quantity of data stored in each test case in the at least one test case; responding to the data quantity larger than or equal to a preset threshold value, and determining whether a user-defined persistence method exists or not; and responding to the existence of the user-defined persistence method, and storing the desensitized data into the test case by using the user-defined persistence method.

In some optional implementations of some embodiments, the data persistence unit may be further configured to: and in response to the absence of the user-defined persistence method, storing the desensitized data into the test case according to a persistence method set by default of a system.

In some optional implementations of some embodiments, the test case set is generated by: determining a characteristic information set corresponding to each service in the service system; combining the characteristic information in the characteristic information set corresponding to each service to obtain each characteristic expression; determining each script corresponding to each characteristic expression according to each characteristic expression; and generating the test case set according to the characteristic expressions and the scripts.

It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 3. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.

Referring now to fig. 8, shown is a schematic diagram of an electronic device 800 suitable for use in implementing some embodiments of the present disclosure. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through communications device 809, or installed from storage device 808, or installed from ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining characteristic information corresponding to target data; matching each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case, wherein the test case comprises: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data; and carrying out case marking on the target data according to the at least one test case.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a determination unit, a matching unit, and a processing unit. Here, the names of these units do not constitute a limitation to the unit itself in some cases, and for example, the determination unit may also be described as a "unit that determines the feature information corresponding to the target data".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of data annotation, comprising:

determining characteristic information corresponding to target data;

matching each feature expression in the test case set with the feature information by using each script in the test case set to obtain at least one test case, wherein the test case comprises: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data;

and carrying out case annotation on the target data according to the at least one test case.

2. The method of claim 1, wherein prior to the determining the characteristic information corresponding to the target data, the method further comprises:

acquiring a data set of interfaces corresponding to each service in a service system within a preset period of time;

and screening the data set to obtain a screened data set comprising the target data.

3. The method of claim 1, wherein the method further comprises:

performing data desensitization on the target data marked by the use case to obtain desensitized data;

and carrying out data persistence on the desensitized data.

4. The method according to claim 2, wherein the acquiring the data set of the interface corresponding to each service in the service system within the predetermined period of time includes:

intercepting and monitoring interfaces corresponding to all services in the service system within the preset period time to obtain the data set.

5. The method of claim 3, wherein the data persistence of the desensitized data comprises:

determining the quantity of data stored in each test case in the at least one test case;

in response to the amount of data being greater than or equal to a predetermined threshold, determining whether a user-defined persistence method exists;

and responding to the existence of the user-defined persistence method, and storing the desensitized data into the test case by using the user-defined persistence method.

6. The method of claim 5, wherein the data persistence of the desensitized data comprises:

and responding to the absence of the persistence method defined by the user, and storing the desensitized data into the test case according to a persistence method set by a system default.

7. The method of claim 1, wherein the set of test cases is generated by:

determining a characteristic information set corresponding to each service in the service system;

combining the characteristic information in the characteristic information set corresponding to each service to obtain each characteristic expression;

determining each script corresponding to each feature expression;

and generating the test case set according to the characteristic expressions and the scripts.

8. A data annotation device, comprising:

the determining unit is configured to determine characteristic information corresponding to the target data;

a matching unit configured to match, by using each script in a test case set, each feature expression in the test case set with the feature information to obtain at least one test case, where the test case includes: the script is used for realizing the matching between the characteristic expression and the characteristic information, and the characteristic expression represents the matching relation between the test case and the target data;

and the use case labeling unit is configured to perform use case labeling on the target data according to the at least one test case.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.