CN113342831A - Data processing method and related equipment - Google Patents

Data processing method and related equipment Download PDF

Info

Publication number
CN113342831A
CN113342831A CN202110878516.6A CN202110878516A CN113342831A CN 113342831 A CN113342831 A CN 113342831A CN 202110878516 A CN202110878516 A CN 202110878516A CN 113342831 A CN113342831 A CN 113342831A
Authority
CN
China
Prior art keywords
target
sample
index
index table
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110878516.6A
Other languages
Chinese (zh)
Inventor
姚胜
闾凡兵
曾海文
牟三钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Hisense Intelligent System Research Institute Co ltd
Original Assignee
Changsha Hisense Intelligent System Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Hisense Intelligent System Research Institute Co ltd filed Critical Changsha Hisense Intelligent System Research Institute Co ltd
Priority to CN202110878516.6A priority Critical patent/CN113342831A/en
Publication of CN113342831A publication Critical patent/CN113342831A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application discloses a data processing method and related equipment. The data processing method comprises the following steps: receiving a sample calling request sent by external equipment aiming at a training task, wherein the sample calling request comprises a target attribute label; searching a first target index table item from a first index table based on the target attribute tag; the first target index table entry comprises a semantic attribute label corresponding to the target attribute label; determining a target label index based on the first target index table entry; acquiring a target sample corresponding to the target label index; sending the target sample to the external device. By the method, the external equipment can quickly acquire the samples required by different model training tasks, and the availability and the usability of the samples are improved.

Description

Data processing method and related equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a data processing method and related equipment.
Background
The samples are used as the basis of artificial intelligence, and the mass high-quality samples provide powerful guarantee for the high precision and the recognition rate of the algorithm model. Currently, samples are primarily stored in multimedia media, such that some sample data may not be directly accessible to the training device. And aiming at different model training tasks, the training equipment needs to search matched samples from massive sample data every time, and the sample acquisition time is long. Thus, the availability and ease of use of the sample are not high.
Disclosure of Invention
In view of this, the data processing method, the data processing device, the sample service management platform, the computer device, and the computer storage medium provided in the embodiments of the present application enable an external device to quickly acquire samples required for different model training tasks, thereby improving availability and usability of the samples.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes:
receiving a sample calling request sent by external equipment aiming at a training task, wherein the sample calling request comprises a target attribute label;
searching a first target index table item from a first index table based on the target attribute tag; the first target index table entry comprises a semantic attribute label corresponding to the target attribute label;
determining a target label index based on the first target index table entry;
acquiring a target sample corresponding to the target label index;
sending the target sample to the external device.
In a second aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes:
the interface module is used for receiving a sample calling request sent by external equipment, wherein the sample calling request comprises a target attribute label; sending a target sample to the external device;
the sample service module is used for searching a first target index table item from a first index table based on the target attribute tag; the first target index table entry comprises a semantic attribute label corresponding to the target attribute label; determining a target label index based on the first target index table entry; and acquiring a target sample corresponding to the target label index.
In a third aspect, an embodiment of the present application provides a sample service management platform, where the sample service management platform includes the data processing apparatus according to the second aspect.
In a fourth aspect, an embodiment of the present application provides a computer device, where the computer device includes: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as described in the first aspect.
In a fifth aspect, the present application provides a computer storage medium having computer program instructions stored thereon, where the computer program instructions, when executed by a processor, implement the data processing method according to the first aspect.
The data processing method provided by the embodiment of the application can receive a sample calling request sent by external equipment for a training task, retrieve a first index table according to a target attribute tag in the sample calling request, determine a first target index table item, determine a target tag index based on the first target index table item, acquire a target sample corresponding to the target attribute tag index, and send the target sample to the external equipment. Therefore, by responding to the sample calling request and providing the sample service for the external equipment, cross-component calling of the sample can be realized, index access is performed through the target attribute tag, the target sample can be accurately and quickly acquired, and the availability and the usability of the sample are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another data processing method provided by an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a sample service management platform according to an embodiment of the present disclosure;
fig. 5 is a schematic hardware structure diagram of a computer device according to an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
Before describing in detail a data processing method applied to a sample service management platform provided in an embodiment of the present application, a brief description is first given of technologies related to the present application.
The samples are the basis of artificial intelligence, and the mass high-quality samples provide powerful guarantee for the high accuracy and the recognition rate of the algorithm. With the rapid development of artificial intelligence, the sample size required by model training is increasing. However, the use of the existing sample has the following problems:
1. active mining of samples is not fully multiplexed. The artificial intelligence solves the problem of single task at the algorithm level, so that the sample is collected and labeled facing to the single task. However, the samples may provide training material for different tasks in many cases, and thus the existing way of using samples results in a low sample reuse rate.
2. The access of the sample is obtained in a single mode. At present, samples are mainly stored in multimedia media, and the training equipment cannot directly access the required samples, and can access the samples only by the operation steps of moving, screening, preprocessing and the like of a user. Therefore, there is currently a lack of ways to propagate and use samples in a manner that supports cross-component standard services, resulting in poor availability and ease of use of samples.
In addition, the current artificial intelligence service platform mainly focuses on algorithm service, most of the artificial intelligence service platforms can only call the borrowed ports which are packaged well by the platform for a service caller, and the expansibility is poor. Therefore, for scientific research institutions such as colleges and universities or small and medium-sized enterprises which lack samples, algorithm training tasks required by the colleges and universities are difficult to complete.
In view of this, embodiments of the present application provide a data processing method, an apparatus, a computer device, and a computer storage medium, which can provide a sample service to an external device by responding to a sample call request, implement cross-component call of a sample, and perform index access through a target attribute tag, so that a target sample can be accurately and quickly obtained, and availability and usability of the sample are improved. First, a data processing method provided in an embodiment of the present application is described below.
It should be noted that the "semantic attribute tag" referred to in the embodiments of the present application may be a tag in a natural semantic format. Such as a car, a person, a run, etc. The "data attribute tag" referred to in the embodiments of the present application may be a tag of a data format. E.g., 8:00, etc.
The first index table and the second index table related to the embodiment of the application are tables indicating the corresponding relationship between the logical record and the physical record, wherein each index item in the tables is arranged in a key sequence.
The data processing method provided by the embodiment of the application can be executed by a sample service management platform. The sample service management platform is used for managing samples and providing sample services. And the sample service management platform may provide a standard external interface for external devices to access the sample service management platform. Further, the sample service management platform may be deployed in a server.
In some embodiments, the sample service management platform may be a containerized deployment platform.
That is, the sample service management platform is deployed in the service cluster in a containerized deployment manner. Here, the containerization deployment method is one of virtualization technologies, and is to uniformly package a system, a development software package, a dependent environment, and the like into a container and deploy the entire container in a server by using a "container" principle.
In addition, the sample service management platform may be managed using a container orchestration engine, such as Kubernets. Thus, in kubernets, a container may be created, an application instance is run in the container, and then management, discovery, and access to the set of application instances are implemented through a built-in load balancing policy.
In the above embodiment, the sample service management platform adopts a containerized deployment mode, so that the overall migration of the platform can be realized, and the platform can be deployed in different servers or across clusters. In addition, the samples are managed through the containerized deployed sample service management platform, the samples can be taken out from other systems as service clusters, the modularization of sample functions is achieved, and the coupling between the systems is reduced.
It should be noted that the sample service management platform may be deployed on a cloud platform that supports containers. Cloud platforms include, but are not limited to, amazon cloud platform (AWS), Google Cloud Platform (GCP), microsoft cloud platform (Azure), and Open Stack, among others.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application. As shown in fig. 1, the data processing method includes:
step S11, receiving a sample call request sent by the external device for the training task, where the sample call request includes the target attribute tag.
In step S12, the first target index table entry is looked up from the first index table based on the target attribute tag.
Here, the first target index entry includes a semantic attribute tag corresponding to the target attribute tag.
In step S13, a target tag index is determined based on the first target index table entry.
In step S14, a target sample corresponding to the target label index is acquired.
Step S15, the target sample is sent to the external device.
In the embodiment, the sample service management platform responds to the sample calling request to provide the sample service for the external equipment, so that cross-component calling of the sample can be realized, index access is performed through the target attribute tag, the target sample can be accurately and quickly acquired, and the availability and the usability of the sample are improved.
Specific implementations of the above steps will be described in detail below.
The sample related to the embodiment of the application can be text information of multimedia resources, such as image information, audio information, video information, various document information and the like. In the embodiments of the present application, specific contents of the sample are not limited.
The "external device" referred to in the embodiments of the present application may be a device for model training, such as an Artificial Intelligence (AI) center. The external equipment can construct an AI algorithm model taking artificial intelligence technologies such as neural network and deep learning as a core, and provides technical supports such as natural language processing and image recognition for a user.
The "sample call request" referred to in the present application may be a call request initiated by an external device for a sample required by different training tasks, and may be a request for serving the call sample. The sample invocation request may be sent by the external device to the sample service management platform.
The sample invocation request may be a Remote invocation request, such as a Remote procedure Call protocol (Remote)
Procedure Call Protocol, RPC), hypertext Transfer Protocol (HTTP) requests, and the like. Thus, after the sample service management platform responds to the sample calling request, the external device can call the sample at the remote end like calling the local file.
In some embodiments, the sample service management platform may include messaging middleware. And sending the sample calling request to the message middleware, and returning a request result through the message middleware. Therefore, the sample service management platform has the message middleware as a buffer station, can receive sample calling requests of a plurality of external devices, and can realize high availability of the sample service management platform.
In the embodiment of the application, the sample calling request includes a target attribute tag, where the target attribute tag may be an attribute tag of a sample required by the external device for the training task, and may be determined according to the training task of the external device.
Alternatively, the target attribute tag may be a semantic attribute tag, which needs to conform to the interface specification of the sample service management platform.
For example, an external training device needs to train an animal recognition model, and the external device sends a request for calling an animal sample to the sample service management platform, wherein the target attribute label of the request is an animal.
Here, the target attribute tag may belong to a multi-dimensional attribute tag, including but not limited to one of the following attribute tags: target domain tags, time domain tags, space domain tags, subject domain tags.
In this embodiment of the application, the target domain label may be a label describing target information such as a target category, a target number, and a main feature of the sample. The time domain label may be a label describing time information such as a standard time, a time period, etc. of the sample. The spatial domain tag may be a tag that describes spatial information such as geographic location, device, angle, lighting, etc. of the source of the sample. The theme zone label can be a label describing key theme information of the sample content, such as a status label, an event label, a behavior label and the like.
In addition, the object property tags may also include expansion tags, such as age tags, gender tags, and the like.
It should be noted that the target attribute tag may be a semantic attribute tag, which is convenient for the user to identify.
In step S11, the sample service management platform may receive a sample call request sent by the external device through the external interface, and parse the sample service call request to obtain a target attribute tag of the call sample required by the external device, so that the sample service management platform may quickly retrieve the target sample.
The "first index table" related to the embodiment of the application may be an index table of an external unified specification of the sample service management platform. The first index table may be constructed from first index table entries, where the first index table entries may include multidimensional attribute tags for samples. The multi-dimensional attribute tags may include a target domain tag, a temporal domain tag, a spatial domain tag, and a subject domain tag. The multidimensional attribute tags in the first index table belong to semantic attribute tags. The semantic attribute tags may be natural semantic information recognizable to the user. Such as car, beijing, morning, rear-end, etc. Optionally, the multi-dimensional attribute tag further comprises an extension tag. The extension tag also belongs to a semantic attribute tag.
In addition, the first index table can also perform classified management on the first index table entries according to a classified index method and semantic attribute labels. For example, in the first index table, the first index table entry labeled as a car may be categorized as a vehicle. Therefore, the sample labels are managed through the index table, so that the sample service management platform supports classified index management, the hierarchical retrieval of the samples is convenient to complete, and the purpose of obtaining interested samples according to needs is achieved.
The "first target index entry" related to the embodiment of the present application may include a semantic attribute tag corresponding to the target attribute tag. That is, the first target index entry may be the first index entry that matches the target attribute tag.
In step S12, the sample service management platform searches the semantic attribute tag matching the target attribute tag from the first index table according to the target attribute tag, and determines a corresponding first target index table entry according to the semantic attribute tag.
In some embodiments, to convert the sample attribute tag into an index table compliant with the external interface specification, before step S12, the data processing method further comprises:
establishing a second index table corresponding to the sample according to the label index of the sample and the multidimensional attribute label;
and performing escape analysis on the second index table based on the escape table to obtain a first index table.
Here, the multi-dimensional attribute tags may include a target domain tag, a time domain tag, a space domain tag, and a subject domain tag. The multidimensional attribute labels in the second index table belong to the data attribute labels and the semantic attribute labels respectively. Alternatively, the target domain label and the subject domain label may belong to a semantic attribute label, and the temporal domain label and the spatial domain label may belong to a data attribute label.
Further, the multi-dimensional attribute tags may also include expansion tags, such as age tags, gender tags, and the like.
The "second index table" related to the embodiment of the present application may be an original index table of an internal unified specification established by the label index of the sample and the multidimensional attribute label of the sample.
It should be noted that the second index table includes a plurality of second index table entries, where one sample corresponds to one second index table entry in the second index table.
The second index table includes a label index of the sample and a target domain label, a time domain label, a space domain label, a subject domain label. The second index table may be as shown in table 1:
TABLE 1 second index Table
Figure 546292DEST_PATH_IMAGE002
The 'escape table' related to the embodiment of the application comprises a mapping relation between a data attribute label and a semantic attribute label. The escape table may be pre-stored in the sample service management platform.
It should be noted that the escape algorithm may be generated based on an escape algorithm model, or may be input by a user. In the embodiment of the present application, the sample service management platform obtains the escape form in many ways, including but not limited to user input, and generation according to an escape algorithm model.
In this embodiment of the application, based on the escape table, the sample service management platform may escape the data attribute tag in the multidimensional attribute tag into an semantic attribute tag. That is, the data attribute tag in the second index table entry is transferred to the semantic attribute tag according to the transfer table, and the semantic attribute tag in the second index table entry is retained, so that the first index table entry corresponding to the second index table entry is generated based on the transferred semantic attribute tag and the retained semantic attribute tag. Therefore, the sample service management platform performs escape analysis on each second index table entry in the second index table based on the escape table to obtain the first index table corresponding to the second index table.
In one example, the first index entry is "005; turning; x: 842453.1789, y: 2623346.320, respectively; 0800-1000; the sample service management platform performs escape parsing on the first index table item into a second index table item based on an escape table, wherein the second index table item is 'vehicle'; beijing; morning; and (5) rear-end collision.
In the above embodiment, the sample service management platform establishes the second index table entry according to the multidimensional attribute tag and the tag index of the sample, which is beneficial for the sample service management platform to manage the sample and search the sample internally and quickly. And the second index table is subjected to escape parsing to be the first index table through the escape table, so that the multidimensional attribute tag of the sample can be converted from the internal standard to the external standard, and the external equipment can be facilitated to retrieve the sample by utilizing the target attribute tag in a grading manner.
In order to improve the utilization rate of the samples, the samples can be mined by multidimensional attributes. In some embodiments, before the creating the second index table corresponding to the sample according to the label index of the sample and the multidimensional attribute label, the data processing method further includes:
extracting multi-dimensional attribute information of a plurality of samples;
generating a multi-dimensional attribute label corresponding to each sample based on the multi-dimensional attribute information of each sample;
and associating and storing the label index of each sample with the multidimensional attribute label.
Here, the samples in the sample service management platform may be labeled with multi-dimensional attribute information. The sample service management platform extracts the multi-dimensional attribute information of each sample, and generates an attribute label corresponding to each attribute information according to each attribute information of each sample, wherein the attribute label is the attribute label of the sample, so that the multi-dimensional attribute label of the sample can be obtained based on each attribute label of the sample.
And the sample service management platform sets a unique label index for each sample, associates the label index of each sample with the multidimensional attribute label, and stores the associated data in a second table entry. And the label index of each sample corresponds to the multi-dimensional attribute label one to one.
In the embodiment, the sample service management platform performs multi-dimensional attribute mining on the sample to generate the multi-dimensional attribute label, so that the value of sample diversity is embodied, the sample can be reused in a plurality of model training tasks, the waste of the sample in the model training is reduced, the sample reuse rate is improved, and the dependence of the model training on the sample is reduced. In addition, the sample service management platform stores the label index of each sample and the multi-dimensional attribute label in an associated manner, so that the corresponding sample can be quickly found according to the attribute label.
The "tag index" referred to in the embodiments of the present application may be used to indicate the location where the sample is stored. That is, the sample service management platform can find the corresponding sample according to the tag index.
In some embodiments, the multidimensional attribute information may include, but is not limited to, target domain information, temporal domain information, spatial domain information, and subject domain information.
In the embodiment of the present application, the target domain information may be information describing a target category, a number, a main feature, and the like of the sample. The time domain information may be information describing a standard time, a time period, etc. of the sample. The spatial domain information may be information describing the geographic location, device, angle, lighting, etc. from which the sample originated. The topic domain information may be key topic information describing the sample content, such as status information, event information, and behavior information.
In order to reduce the storage load of the sample service management platform, in some embodiments, before the extracting the multi-dimensional attribute information of the plurality of samples, the data processing method further includes: a plurality of samples are stored in a distributed manner.
Here, the sample service management platform stores multiple samples in a distributed manner, thereby spreading the samples across disk space on different devices. Therefore, the storage load is shared by a plurality of devices, and the reliability, the availability and the storage efficiency of the sample service management platform are improved.
Optionally, the sample service management platform stores the sample on different hard disks of different devices through the MinIO service, so that the different hard disks form an object storage service, and a sample distributed cluster is established, so that the sample is extracted from other services. And the hard disks are distributed on different nodes, so that single-point faults are avoided. In addition, due to the fact that the distributed MinIO has the high-availability characteristic and the transverse storage capacity, sample clusters can be expanded infinitely, and the sample storage capacity is increased.
In step S13, the target label index may be the label index of the target sample, and may be determined according to the first target index entry.
Here, in a case where the first index entry includes a tag index of the sample, the sample service management platform may determine the target tag index directly from the first target index entry. Therefore, the label index can be quickly inquired, and the speed of obtaining the sample is improved.
In order to find the target label index corresponding to the target attribute label more fully, in some embodiments, step S13 may be performed as:
analyzing the first target index table item to obtain a second target index table item;
and determining a target label index based on the second target index table entry.
Here, the second target index table entry belongs to the second index table, and the first target index table entry corresponds to the second target index table entry. The second target index table entry includes a data attribute tag corresponding to the semantic attribute tag in the first target index table entry. The semantic attribute tag corresponds to the target attribute tag.
Specifically, the sample service management platform parses the first target index table entry, converts the semantic attribute tag corresponding to the target attribute tag into the data attribute tag corresponding to the target attribute tag, and searches the second index table according to the data attribute tag to obtain a second target index table entry.
Moreover, since the second index table entry includes the tag index of the sample, the sample service management platform can directly find the target tag index corresponding to the target sample according to the second target index table entry.
It should be noted that, regardless of whether the first index entry includes the sample tag index, the sample service management platform may obtain the target tag index through the above steps. In addition, there are many escape methods, including but not limited to escape according to an escape algorithm and escape according to a preset escape table.
In the above embodiment, because the second index entry is the original index entry of the sample, the sample service management platform parses the first target index entry and changes the target attribute tag into the data attribute tag, so that the second target index entry corresponding to the target attribute tag can be searched from the original index entry of the sample, and a more comprehensive target tag index can be searched, thereby avoiding the problem that the target tag index cannot be searched according to the first index entry because an escape error exists between the first index entry and the second index entry.
In some embodiments, determining the target tag index based on the second target index entry comprises: and taking the label index in the second target index table entry as a target label index.
Here, since the second target index entry is a second index entry in the second index table that matches the target attribute entry, the second index entry includes the tag index. Therefore, the sample service management platform can directly confirm the tag index in the second target index table entry as the target tag index.
Since the sample service management platform stores the escape table in advance, in order to improve the efficiency of obtaining the second target index table entry, in some embodiments, the first target index table entry is analyzed to obtain the second target index table entry, which may be specifically implemented as:
and analyzing the first target index table item based on a preset escape table to obtain a second target index table item.
Here, the sample service management platform parses the first target index entry according to the escape table, which may be that the sample service management platform performs escape on a semantic attribute tag in the first target index entry into a data attribute tag according to the escape table, so as to find a corresponding second target index entry according to the data attribute tag.
In the above embodiment, because the escape table includes the mapping relationship between the data attribute tag and the semantic attribute tag, the sample service management platform analyzes the first index target entry according to the preset escape table, so that the data attribute tag after the escape of the first target entry can be quickly obtained, and the second target index entry can be found through the data attribute tag. Therefore, the efficiency and the accuracy of obtaining the second target index can be improved.
In step S14, the target sample may be a sample corresponding to the target attribute tag. Which corresponds to the target tag index.
Step S14 may be specifically executed, where the sample service management platform may find a storage location of the target sample according to the target tag index, and read the target sample from the corresponding storage location, thereby obtaining the target sample corresponding to the target tag index.
Alternatively, the sample service management platform may read the target sample through a distributed storage service, such as a MinIO service.
In step S15, after acquiring the target sample, the sample service management platform may send the target sample to the external device that sent the sample call request, so that the external device acquires the target sample to perform the model training task.
Alternatively, the sample service management platform may send the target sample to an external device through an external interface.
Fig. 2 is a schematic diagram of another data processing method provided in an embodiment of the present application. As shown in fig. 2, the following describes an implementation process of the present embodiment with reference to an application example.
The sample storage mode adopts MinIO service to be hung on different hard disks of different devices, and an RPC remote calling mode is adopted to access the sample service management platform. The sample service management platform adopts a containerization (docker) deployment mode and is managed through Kubernets.
Before the external equipment accesses the sample service management platform, the sample service management platform performs attribute extraction on the sample from multiple dimensions and labels attribute information to obtain a multi-dimensional attribute label of the sample. Therefore, the multidimensional attribute of the sample is mined, and the multiplexing rate of the sample is improved.
The sample service management platform establishes an original index table item (namely a second index table item) through the sample tagging service, and the original index table item conforms to the internal unified specification of the platform. The original index entry includes a label index and a multidimensional attribute label for the sample.
The sample service management platform converts the original index table item into an external index table item (namely a first index table item) according to a preset escape table, and the external index table item conforms to the platform external unified specification, so that the classification management of the samples is completed, the samples can be retrieved in a subsequent grading manner, and the purpose of obtaining interested samples according to requirements is achieved.
As shown in fig. 2, the external device accesses the sample service management platform by means of RPC remote call that introduces message middleware. Specifically, the external device determines a target attribute tag of a required sample according to the training task, and sends a PRC remote invocation request (i.e., a sample invocation request) to the message middleware, wherein the PRC remote invocation request may include the target attribute tag. And the request result is returned through the message middleware, so that the remote calling through the RPC can enable the external equipment to read the remote sample as calling the local file.
As shown in fig. 2, in the container of the sample service management platform, processes such as sample reading, tag indexing, escape parsing, and interface service are executed.
The interface service is used for receiving the message in the middle of the message, searching the external index table entry corresponding to the target attribute tag, and sending the target sample corresponding to the target attribute tag. The escape parsing is used for escaping the external index table entry corresponding to the target attribute tag into the original index table entry corresponding to the target attribute tag. The tag index is used for searching the storage position of the target sample according to the tag index in the original index table entry. Sample reading is used to read a sample from a storage location of a target sample through the MinIO service. Thus, sample indexing services can be accomplished through sample reading, tag indexing, escape parsing, and interface services.
In one example, first, the external device needs to perform a training task for vehicle identification, determine that the target attribute tag is a vehicle, and send a sample invocation request containing the target attribute tag to the sample service management platform. And the sample service management platform receives and analyzes the sample calling request, and determines that the external equipment needs to call the attribute label as the sample of the vehicle.
Then, the sample service management platform searches for an external index table item of which the attribute tag is the vehicle, performs escape on the searched external index table item to obtain a corresponding internal index table item, and searches and reads a target sample according to the tag index in the internal index table item.
Finally, the sample service management platform sends the target sample to the external device.
In this way, by accessing the sample service management platform, the external device can read a sample at a remote end, and can acquire samples required by different training tasks. In addition, the target sample can be quickly found through the label index, and the efficiency of obtaining the sample is improved.
Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the data processing apparatus 20 applied to a sample service management platform may include:
the interface module 21 is configured to receive a sample call request sent by an external device for a training task, where the sample call request includes a target attribute tag; sending a target sample to the external device;
a sample service module 22, configured to look up a first target index entry from a first index table based on the target attribute tag; the first target index table entry comprises a semantic attribute label corresponding to the target attribute label; determining a target label index based on the first target index table entry; and acquiring a target sample corresponding to the target label index.
In some embodiments, the sample service module 22 is specifically configured to parse the first target index entry to obtain a second target index entry; the second target index table entry comprises a data attribute tag corresponding to the semantic attribute tag, and the second target index table entry belongs to the first index table; and determining a target label index based on the second target index table entry.
In some embodiments, the sample service module 22 is specifically configured to parse the first target index table entry based on a preset escape table to obtain a second target index table entry; the escape table comprises a mapping relation between the data attribute labels and the semantic attribute labels; taking the label index in the second target index table item as a target label index
In some embodiments, the apparatus 20 further comprises:
the sample management module is used for establishing a second index table corresponding to the sample according to the label index and the multi-dimensional attribute label of the sample; and performing escape analysis on the second index table based on the escape table to obtain a first index table.
In some embodiments, the sample management module is further configured to extract multi-dimensional attribute information of a plurality of samples; the multi-dimensional attribute information comprises target domain information, time domain information, space domain information and subject domain information; generating a multi-dimensional attribute label corresponding to each sample based on the multi-dimensional attribute information of each sample; and associating and storing the label index of each sample with the multidimensional attribute label.
In some embodiments, the sample management module is further configured to distributively store the plurality of samples before the extracting the multi-dimensional attribute information of the plurality of samples.
Fig. 4 is a schematic structural diagram of a sample service management platform according to an embodiment of the present disclosure, and as shown in fig. 4, the sample service management platform 30 includes any one of the data processing apparatuses 20 according to the embodiment of the present disclosure.
Fig. 5 is a schematic hardware structure diagram of a computer device according to an embodiment of the present application. As shown in fig. 5, a computer device 40 may include a processor 41 and a memory 42 storing computer program instructions.
Specifically, the processor 41 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 42 may include mass storage for data or instructions. By way of example, and not limitation, memory 42 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 42 may include removable or non-removable (or fixed) media, where appropriate. The memory 42 may be internal or first, where appropriate, to the integrated gateway disaster recovery device. In a particular embodiment, the memory 42 is a non-volatile solid-state memory.
Memory 42 may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the application.
The processor 41 implements any of the data processing methods in the above embodiments by reading and executing computer program instructions stored in the memory 42.
In one example, the computer device may also include a communication interface 43 and a bus 44. As shown in fig. 5, the processor 41, the memory 42, and the communication interface 43 are connected via a bus 44 to complete mutual communication.
The communication interface 43 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
The bus 44 comprises hardware, software, or both that couple the components of the computer device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 44 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may implement the data processing method, the apparatus and the sample service management platform described in conjunction with fig. 1 to 4 based on executing the method in the embodiment of the present application.
In addition, in combination with the data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the data processing methods in the above embodiments.
In addition, in combination with the data processing method in the foregoing embodiments, the present application provides a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the computer program or instructions implement any one of the data processing methods in the foregoing embodiments.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of data processing methods, apparatus, sample service management platforms, computer devices and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (10)

1. A method of data processing, the method comprising:
receiving a sample calling request sent by external equipment aiming at a training task, wherein the sample calling request comprises a target attribute label;
searching a first target index table item from a first index table based on the target attribute tag; the first target index table entry comprises a semantic attribute label corresponding to the target attribute label;
determining a target label index based on the first target index table entry;
acquiring a target sample corresponding to the target label index;
sending the target sample to the external device.
2. The method of claim 1, wherein determining a target tag index based on the first target index entry comprises:
analyzing the first target index table item to obtain a second target index table item; the second target index table entry comprises a data attribute tag corresponding to the semantic attribute tag, and the second target index table entry belongs to a second index table;
and determining a target label index based on the second target index table entry.
3. The method of claim 2, wherein the parsing the first target index table entry to obtain a second target index table entry comprises:
analyzing the first target index table item based on a preset escape table to obtain a second target index table item; the escape table comprises a mapping relation between the data attribute labels and the semantic attribute labels;
determining a target tag index based on the second target index table entry, including:
and taking the label index in the second target index table entry as a target label index.
4. The method of claim 1, wherein prior to looking up a first target index entry from a first index table based on the target attribute tag, the method further comprises:
establishing a second index table corresponding to the sample according to the label index of the sample and the multidimensional attribute label;
and performing escape analysis on the second index table based on the escape table to obtain a first index table.
5. The method of claim 4, wherein before the creating the second index table corresponding to the sample according to the label index of the sample and the multidimensional attribute label, the method further comprises:
extracting multi-dimensional attribute information of a plurality of samples; the multi-dimensional attribute information comprises target domain information, time domain information, space domain information and subject domain information;
generating a multi-dimensional attribute label corresponding to each sample based on the multi-dimensional attribute information of each sample;
and associating and storing the label index of each sample with the multidimensional attribute label.
6. The method of claim 5, wherein prior to said extracting multi-dimensional attribute information for a plurality of samples, the method further comprises: distributively storing the plurality of samples.
7. A data processing apparatus, characterized in that the apparatus comprises:
the interface module is used for receiving a sample calling request sent by external equipment aiming at a training task, wherein the sample calling request comprises a target attribute label; sending a target sample to the external device;
the sample service module is used for searching a first target index table item from a first index table based on the target attribute tag; the first target index table entry comprises a semantic attribute label corresponding to the target attribute label; determining a target label index based on the first target index table entry; and acquiring a target sample corresponding to the target label index.
8. A sample service management platform, characterized in that it comprises a data processing apparatus according to claim 7.
9. A computer device, the device comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the data processing method of any of claims 1-6.
10. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 6.
CN202110878516.6A 2021-08-02 2021-08-02 Data processing method and related equipment Pending CN113342831A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110878516.6A CN113342831A (en) 2021-08-02 2021-08-02 Data processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110878516.6A CN113342831A (en) 2021-08-02 2021-08-02 Data processing method and related equipment

Publications (1)

Publication Number Publication Date
CN113342831A true CN113342831A (en) 2021-09-03

Family

ID=77480524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110878516.6A Pending CN113342831A (en) 2021-08-02 2021-08-02 Data processing method and related equipment

Country Status (1)

Country Link
CN (1) CN113342831A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012063959A (en) * 2010-09-15 2012-03-29 Ricoh Co Ltd Indexing method, retrieval method, and storage medium thereof
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN103927387A (en) * 2014-04-30 2014-07-16 成都理想境界科技有限公司 Image retrieval system, method and device
CN103942282A (en) * 2014-04-02 2014-07-23 新浪网技术(中国)有限公司 Sample data obtaining method, device and system
CN109189959A (en) * 2018-09-06 2019-01-11 腾讯科技(深圳)有限公司 A kind of method and device constructing image data base

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012063959A (en) * 2010-09-15 2012-03-29 Ricoh Co Ltd Indexing method, retrieval method, and storage medium thereof
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN103942282A (en) * 2014-04-02 2014-07-23 新浪网技术(中国)有限公司 Sample data obtaining method, device and system
CN103927387A (en) * 2014-04-30 2014-07-16 成都理想境界科技有限公司 Image retrieval system, method and device
CN109189959A (en) * 2018-09-06 2019-01-11 腾讯科技(深圳)有限公司 A kind of method and device constructing image data base

Similar Documents

Publication Publication Date Title
CN109033387B (en) Internet of things searching system and method fusing multi-source data and storage medium
CN105677615B (en) A kind of distributed machines learning method based on weka interface
US11055373B2 (en) Method and apparatus for generating information
US11036764B1 (en) Document classification filter for search queries
CN111726336B (en) Method and system for extracting identification information of networked intelligent equipment
CN110007906B (en) Script file processing method and device and server
CN108228664B (en) Unstructured data processing method and device
CN111563382A (en) Text information acquisition method and device, storage medium and computer equipment
US20200204688A1 (en) Picture book sharing method and apparatus and system using the same
CN110990057A (en) Extraction method, device, equipment and medium of small program sub-chain information
CN114244611A (en) Abnormal attack detection method, device, equipment and storage medium
CN110489740B (en) Semantic analysis method and related product
CN113342831A (en) Data processing method and related equipment
CN115879001A (en) Smart campus multimedia comprehensive information service terminal management method and system
CN110895587A (en) Method and device for determining target user
CN115495489A (en) Cross-border logistics order track query method and device, terminal equipment and storage medium
CN114064905A (en) Network attack detection method, device, terminal equipment, chip and storage medium
CN113779473A (en) Internet big data processing method and system based on artificial intelligence
US10803115B2 (en) Image-based domain name system
CN113067878A (en) Information acquisition method, information acquisition device, information acquisition apparatus, information acquisition medium, and program product
CN112287104A (en) Natural language processing method and device
CN112256730A (en) Information retrieval method and device, electronic equipment and readable storage medium
CN116467607B (en) Information matching method and storage medium
CN210804423U (en) Website information acquisition and release platform system
CN114827309B (en) Equipment fingerprint generation method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903

RJ01 Rejection of invention patent application after publication