WO2023024474A1 - Data set determination method and apparatus, and computer device and storage medium - Google Patents

Data set determination method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2023024474A1
WO2023024474A1 PCT/CN2022/079074 CN2022079074W WO2023024474A1 WO 2023024474 A1 WO2023024474 A1 WO 2023024474A1 CN 2022079074 W CN2022079074 W CN 2022079074W WO 2023024474 A1 WO2023024474 A1 WO 2023024474A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
data
database
semantic information
information
Prior art date
Application number
PCT/CN2022/079074
Other languages
French (fr)
Chinese (zh)
Inventor
张元瀚
黄耿石
刘冬阳
滕家宁
王坤
尹榛菲
邵婧
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023024474A1 publication Critical patent/WO2023024474A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a method, device, computer equipment, and storage medium for determining a data set.
  • the existing test set is usually a pre-set data set, for example, ImageNet data set and so on. Since the existing test set contains test data containing multiple types of objects in various scenarios, when the model is tested through the existing test set, it cannot reflect the corresponding performance of the model for various types of objects. Test performance on test data. At this time, when using the existing test set to test the performance of the model, the robustness of the model will be affected, thereby affecting the processing accuracy of the model.
  • Embodiments of the present disclosure at least provide a method, device, computer equipment, and storage medium for determining a data set.
  • an embodiment of the present disclosure provides a method for determining a data set, including: acquiring a semantic database containing multiple semantic information; creating multiple label data based on the semantic database; one label data corresponds to one semantic category, and the
  • the tag data includes object tags belonging to the corresponding semantic category; the semantic category corresponding to the plurality of tag data is a category that can perform a full range of representation tests on the model to be tested; based on the preset data set, it is at least part of the tag data
  • the object tag determines matching data, and based on the matching data, determines test data sets corresponding to at least part of the tag data, to obtain multiple test data sets.
  • the embodiments of the present disclosure process the semantic database to obtain label data corresponding to multiple semantic categories, and create test data sets corresponding to multiple semantic categories based on the determined multiple label data.
  • the test data sets of multiple semantic categories when the performance test of the model to be tested is performed through the determined multiple test data sets, the model to be tested can be tested in an all-round way, so as to obtain the all-round performance of the model to be tested.
  • the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
  • creating multiple label data based on the semantic databases includes: fusing semantic information in multiple semantic databases to obtain a fusion semantic database; wherein , the fusion semantic database includes a plurality of fusion semantic information and hierarchical information between the plurality of fusion semantic information; determine a plurality of semantic categories to be divided, and divide the fusion semantic database according to the plurality of semantic categories The plurality of label data.
  • a more comprehensive semantic database that is, a fusion semantic database
  • a more comprehensive semantic database can be obtained by performing semantic fusion of multiple semantic databases.
  • label data with more abundant semantic categories can be obtained.
  • the test model is tested through the test data set corresponding to the multiple label data, the full range of the test model can be realized. Test, so as to obtain the full range of representation performance of the model to be tested.
  • the merging the semantic information in multiple semantic databases to obtain the fused semantic database includes: determining the semantic information to be fused in the first semantic database of the multiple semantic databases; The semantic information to be fused does not contain the semantic information of the next level in the first semantic database; based on the hierarchical information between the semantic information in the first semantic database, determine the semantic path where the semantic information to be fused is located, so The semantic path includes at least one semantic information; based on the high-level semantic information in the semantic path before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused to obtain the
  • the fusion semantic database, the second semantic database is a database other than the first semantic database among the plurality of semantic databases.
  • the semantic information to be fused and the semantic information in the second semantic database are fused based on the high-level semantic information in the semantic path before the semantic information to be fused
  • Obtaining the fused semantic database includes: determining target semantic information in the high-level semantic information in order of levels from high to low; the target semantic information includes corresponding semantic information in the second semantic database ; Fusing the semantic information to be fused with the semantic information of the next level of the semantic information corresponding to the target semantic information in the second semantic database to obtain the fused semantic database.
  • the fused semantic database is a tree-structured database; dividing the fused semantic database into the plurality of label data according to the plurality of semantic categories includes: Determining nodes corresponding to at least part (for example, each) semantic category in the tree-structured database to obtain a plurality of target nodes; using each of the target nodes as a root node to divide the tree-structured database, Dividing databases with a plurality of sub-tree structures, wherein a database with a sub-tree structure corresponds to a target node; determining the plurality of label data based on the database with a sub-tree structure, wherein the label data
  • the object label is the semantic information in the database corresponding to the sub-tree structure.
  • the fusion semantic database is divided into label data corresponding to multiple semantic categories, and then multiple test data sets are determined according to the multiple label data, and the test model that can be tested can be obtained.
  • the test data set is comprehensively represented, and when the model is tested according to the multiple test data sets, the performance of the model to be tested on each semantic category can be determined.
  • the preset data set includes a plurality of data and data tags of the plurality of data; the preset data set is based on object tags of at least part (for example, each) of the tag data
  • Determining matching data includes: determining object tags included in the tag data; matching data tags in the preset data set with the object tags to determine at least one set of matching tags; Determining at least one piece of data corresponding to a data label in at least one set (for example, each set) of matching labels in the data set, and determining the corresponding at least one piece of data as matching the object label in the set of matching labels data.
  • the above preset data set may be selected as the following two data sets: ImageNet and Places. Since the data sets ImageNet and Places contain a large number of natural pictures, when multiple test data sets are determined based on the data set ImageNet and Places, a more comprehensive data set can be obtained, and the test model is treated according to the multiple test data sets. When testing, the performance of the model to be tested on at least some (eg, each) semantic categories can be determined.
  • the method further includes: performing test processing on the model to be tested through at least one test data set to obtain multiple test results; calculating the average value of the multiple test results, and The average value is determined as a test result of an all-round representation test on the model to be tested.
  • the omnidirectional representation of the model to be tested can be determined quantitatively, thereby determining the robustness of the model to be tested.
  • relevant technical personnel can also be instructed to carry out targeted training on the model to be tested, so that the model to be tested can be better processed in the test data under at least part (for example, each) semantic category result.
  • the method further includes: in the case that no matching data is determined in the preset data set that matches the target object tag in the target tag data, determining the object tag corresponding to the target tag data The target semantic category; searching for a matching database matching the target semantic category in the candidate database, and looking for data matching the target object label in the matching database.
  • the method further includes: when a target data label is determined in the preset data set, determine the target data label based on hierarchical information between data labels in the preset data set The upper-level label of the target data label; the target data label is a data label that does not contain the corresponding object label in the object labels of multiple label data; determine the semantic information corresponding to the upper-level label, and in the Determine the semantic information that matches the semantic information corresponding to the upper-level label among the plurality of label data; add the semantic information corresponding to the target data label as new semantic information to the matched semantic information In the semantic information of the next level, and based on the preset data set, matching data is determined for the new semantic information.
  • the semantic information corresponding to the object tags in multiple tag data is supplemented by the data tags in the preset data set, which can enrich the semantic information in the tag data and obtain more and more comprehensive fusion semantic databases.
  • the test accuracy of the model to be tested can be obtained.
  • an embodiment of the present disclosure further provides a data set determination device, including: an acquisition unit, configured to acquire a semantic database containing multiple semantic information; a creation unit, configured to create multiple tag data based on the semantic database ; A label data corresponds to a semantic category, and the label data includes an object label belonging to the corresponding semantic category; the semantic category corresponding to the plurality of label data is a category that can perform a full range of representation tests on the model to be tested; determine the unit, use Determining matching data for object tags of at least part of the tag data based on the preset data set, and determining test data sets respectively corresponding to at least part of the tag data based on the matching data, to obtain multiple test data sets.
  • an embodiment of the present disclosure further provides a computer device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
  • a computer device including: a processor, a memory, and a bus
  • the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processing
  • the processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
  • embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.
  • an optional implementation manner of the present disclosure further provides a computer program product, including computer-readable codes, or a computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are processed in an electronic device
  • the processor in the electronic device executes the above first aspect, or the steps in any possible implementation manner of the first aspect.
  • a semantic database containing multiple semantic information is obtained, and then multiple label data can be created based on the semantic database, and based on a preset data set, object labels for at least part of (for example, each) label data Determine the matching data, and then obtain multiple test data sets.
  • the embodiments of the present disclosure process the semantic database to obtain label data corresponding to multiple semantic categories, and create test data sets corresponding to multiple semantic categories based on the determined multiple label data.
  • the test data sets of multiple semantic categories when the performance test of the model to be tested is performed through the determined multiple test data sets, the model to be tested can be tested in an all-round way, so as to obtain the all-round performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
  • FIG. 1 shows a flowchart of a method for determining a data set provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic structural diagram of a tree-structured first semantic database provided by an embodiment of the present disclosure
  • FIG. 3 shows a flow chart of specific steps for determining matching data for object tags of each tag data based on a preset data set in the method for determining a data set provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of an apparatus for determining a data set provided by an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • the existing test set is usually a pre-set data set, for example, the ImageNet data set and so on. Since the existing test set contains test data containing multiple types of objects in various scenarios, when the model is tested through the existing test set, it cannot reflect the corresponding performance of the model for various types of objects. Test performance on test data. At this time, when using the existing test set to test the performance of the model, the robustness of the model will be affected, thereby affecting the processing accuracy of the model.
  • the present disclosure provides a method, device, computer equipment and storage medium for determining a data set. It can be seen from the above description that the embodiments of the present disclosure process the semantic database to obtain label data corresponding to multiple semantic categories, and create test data sets corresponding to multiple semantic categories based on the determined multiple label data.
  • the test data sets of multiple semantic categories when the performance test of the model to be tested is performed through the determined multiple test data sets, the model to be tested can be tested in an all-round way, so as to obtain the all-round performance of the model to be tested.
  • the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
  • the method for determining a data set provided by an embodiment of the present disclosure is generally executed by a computer with certain computing power device, in some possible implementation manners, the method for determining the data set may be executed by a processor running computer executable codes.
  • FIG. 1 is a flow chart of a method for determining a data set provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:
  • S101 Acquire a semantic database including multiple semantic information.
  • semantic information contained in the semantic database can be used to represent the information of various entities, and here, the information of entities can also be called the conceptual information of objects.
  • the semantic information may be Chinese information or foreign language information, which is not specifically limited in the present disclosure.
  • the semantic information may be Chinese information, and the semantic information may also be English information.
  • semantic information can be cat, dog, pedestrian, car, etc., or cat, domestic cat, person, etc.
  • the semantic database may also contain hierarchical information between multiple semantic information, where the hierarchical information is used to represent the ownership relationship between multiple semantic information ( or superior-subordinate relationship).
  • a plurality of semantic information includes information on mammals, reptiles, tigers, dogs, snakes, lizards, and the like.
  • information such as mammals and reptiles can be used as a level of semantic information.
  • semantic information such as tigers and dogs belong to the next level of semantic information corresponding to the category of mammals.
  • semantic information such as snakes and lizards belong to the next level of semantic information corresponding to the category of reptiles.
  • the relationship between mammals and tigers, dogs; reptiles and snakes, lizards and other information constitutes the hierarchical information in the semantic database (that is, the affiliation relationship or the superior-subordinate relationship).
  • the number of acquired semantic databases may be multiple, and the present disclosure does not specifically limit the number of acquired semantic databases.
  • the number of acquired semantic databases may be 2, 3, 4, etc., which is not specifically limited in the present disclosure.
  • the number of acquired multiple semantic databases may be two, and the semantic information in the two semantic databases can be used to represent objects in the natural environment.
  • the two semantic databases can be Wordnet semantic database and Wikidata semantic database.
  • the multiple semantic databases can also be selected as other types of databases, which will not be listed in this disclosure.
  • S103 Create multiple label data based on the semantic database; one label data corresponds to one semantic category, for example, each label data can correspond to one semantic category, and each label data contains object labels belonging to the corresponding semantic category; the multiple labels
  • the semantic category corresponding to the data is the category that can perform comprehensive representation tests on the model to be tested.
  • the semantic database contains multiple semantic information, wherein the multiple semantic information belongs to multiple semantic categories, for example, the multiple semantic categories can be person, food, location, bird, reptile, mammal, insect, fish, clothing, device, structure, vehicle, flower, herb, tree, fruit.
  • the omnidirectional representation test is used to characterize the performance test of the model to be tested by testing the test data (for example, natural pictures) under as many semantic categories as possible, so as to obtain the performance test results of the test data of the model to be tested under each semantic category.
  • each tag data corresponds to one of the above multiple semantic categories.
  • the plurality of label data includes: label data 1, label data 2 and label data 3, wherein the label data 1 corresponds to the semantic category flower; the label data 2 corresponds to the semantic category food; the label data 3 corresponds to the semantic category location, etc.
  • an object label corresponding to a semantic category is included.
  • an object label belonging to the semantic category "flower” is included.
  • the object label can be "rose (rose)", “jasmine ( Jasmine)” and other object labels.
  • the object label in each label data can be understood as the semantic information under the corresponding semantic category in the semantic database.
  • S105 Based on the preset data set, determine matching data for at least part (for example, each) of the object tags of the tag data, and determine corresponding tests for at least part (for example, each) of the tag data based on the matching data Data set, get multiple test data sets.
  • the data corresponding to multiple semantic categories can be obtained.
  • the test data set when the performance test of the model to be tested is performed through the determined multiple test data sets, can realize the comprehensive test of the model to be tested, so as to obtain the comprehensive performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
  • Step S1031 Fusing semantic information in multiple semantic databases to obtain a fused semantic database; wherein, the fused semantic database includes multiple fused semantic information and hierarchical information between multiple fused semantic information;
  • Step S1032 Determine a plurality of semantic categories to be divided, and divide the fusion semantic database into the plurality of label data according to the plurality of semantic categories.
  • the semantic information in multiple semantic databases can be fused to obtain the fused semantic database; after that, the fused semantic database can be divided according to the multiple semantic categories to be divided, and multiple divisions can be obtained. label data.
  • one semantic database may be selected from multiple semantic databases as the reference semantic database. Then, the semantic mapping relationship between the semantic information in the benchmark semantic database and the semantic information in the remaining semantic databases in the multiple semantic databases is established, and then the semantic information in the multiple semantic databases is fused according to the semantic mapping relationship to obtain the fusion semantic database.
  • the two semantic databases can be Wordnet semantic database and Wikidata semantic database.
  • Wikidata can be selected as the benchmark semantic database
  • Wordnet is multiple The remaining semantic databases in the semantic database.
  • the above-mentioned semantic mapping relationship may be established based on the semantic path of the semantic information in the benchmark semantic database that does not contain the semantic information of the next level in the benchmark semantic database.
  • a semantic database corresponding to a larger amount of conceptual information may be determined as a benchmark semantic database from among multiple semantic databases.
  • a more comprehensive semantic database that is, a fusion semantic database
  • a more comprehensive semantic database can be obtained by performing semantic fusion of multiple semantic databases.
  • label data with more abundant semantic categories can be obtained.
  • the test model is tested through the test data set corresponding to the multiple label data, the full range of the test model can be realized. Test, so as to obtain the full range of representation performance of the model to be tested.
  • the semantic information in multiple semantic databases is fused to obtain the fused semantic database, which specifically includes the following steps:
  • Step S11 determining the semantic information to be fused in the first semantic database of the plurality of semantic databases; the semantic information to be fused does not include semantic information of the next level in the first semantic database;
  • Step S12 Based on the hierarchical information among the semantic information in the first semantic database, determine the semantic path where the semantic information to be fused is located, and the semantic path contains at least one semantic information;
  • Step S13 Based on the high-level semantic information in the semantic path before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused to obtain the fused semantic database.
  • the second semantic database is a database other than the first semantic database among the plurality of semantic databases.
  • one or more semantic databases are selected from multiple semantic databases as the first semantic database.
  • the first semantic database here is the reference semantic database described above.
  • the semantic database corresponding to a larger amount of concept information (semantic information) among the plurality of semantic databases may be determined as the first semantic database.
  • the semantic information to be fused can be determined in the first semantic database according to the hierarchical information among the semantic information contained in the first semantic database.
  • the semantic information of the next level not included in the first semantic database may be determined as the semantic information to be fused.
  • this first semantic database includes: node 1 and node 2, wherein, node 1 includes node 11 to Node 14, node 2 includes node 21 to node 23, node 11 includes node 111 and node 112, at this time, node 12 to node 14, node 21 to node 23, and the semantic information corresponding to node 111 and node 112 do not include the following One level of semantic information.
  • the semantic information corresponding to the above nodes can be determined as the semantic information to be fused.
  • the semantic path of each semantic information to be fused in the first semantic database can be determined.
  • the semantic path corresponding to the speech information to be fused corresponding to the node 111 may be: node 1-node 11-node 111.
  • the semantic information to be fused and the semantic information in the second semantic database can be fused according to the high-level semantic information located between the semantic information to be fused in the semantic path.
  • the semantic information to be fused corresponding to "node 111" may be fused with the semantic information in the second semantic database.
  • a first semantic database can be determined from the multiple semantic databases in the manner described above, and then the first semantic database
  • the semantic information to be fused in is respectively fused with the semantic information in the remaining semantic database (ie, the second semantic database).
  • the specific fusion process is the process described in the above steps S11 to S13, which will not be repeated here.
  • the semantic information to be fused and the semantic information in the second semantic database are fused , to obtain the fusion semantic database, comprising the following steps:
  • the high-level semantic information in the first semantic database before the semantic information to be fused can be obtained, for example, "node 1" as shown in FIG. 2
  • the corresponding semantic information and the semantic information corresponding to "node 11" can be obtained.
  • the obtained high-level semantic information can be determined in order of high-level semantic information from high-level to low-level, and the target semantic information can be determined in the high-level semantic information.
  • the semantic path determine the upper level semantic information of the semantic information to be fused, and then judge whether the second semantic database contains the semantic information corresponding to the upper level semantic information. If it is determined that it is contained, the upper-level semantic information is determined as the target semantic information. In the case of judging that it does not contain, continue to determine the semantic information of the upper level of the semantic information of the upper level, and judge whether the second semantic database contains the semantic information of the upper level of the semantic information of the upper level The semantic information corresponding to the information. If it is determined that it is included, determine the semantic information of the upper level of the upper level semantic information as the target semantic information, otherwise, continue to search for higher level semantic information along the semantic path.
  • multiple semantic databases include Wikidata database and Wordnet database.
  • the first semantic database may be selected as the Wikidata database
  • the second semantic database may be selected as the Wordnet database.
  • the semantic information to be fused does not contain the semantic information of the next level.
  • the semantic information to be fused can be Toyger information, and then the Toyger information in the Wikidata semantic database can be determined.
  • a semantic path, for example, the semantic path is Toyger-Domestic Cat-Cat.
  • the high-level semantic information of Toyger information can be determined, for example, Domestic Cat information and Cat information respectively.
  • the target semantic information can be determined according to the hierarchical order from high to low (or understood as the hierarchical order from bottom to top).
  • the target semantic information is Domestic Cat information.
  • the semantic information corresponding to the target semantic information in the Wordnet semantic database is also Domestic Cat information.
  • the Toyger information (semantic information to be fused) in the Wikidata semantic database can be fused with the next-level semantic information of the Domestic Cat information in the Wordnet semantic database.
  • the manner described above can be used to fuse the semantic information to be fused with the semantic information in the Wordnet semantic database. After fusing each semantic information to be fused, a corresponding fused semantic database can be obtained.
  • the Nth semantic database can be selected as the first semantic database, and then one of the remaining N-1 semantic databases is arbitrarily selected
  • the semantic database is used as the second semantic database.
  • the semantic information to be fused can be selected from the first semantic database, and the semantic information to be fused is fused with the semantic information in the second semantic database, thereby completing the fusion of the two semantic databases , get the fusion semantic database M.
  • the fusion of semantic information results in the final fusion semantic database.
  • the fused semantic database is a tree-structured database
  • Step S21 Determining nodes corresponding to at least some (for example, each) semantic categories in the tree-structured database to obtain a plurality of target nodes;
  • Step S22 Using the target node as a root node, divide the tree-structured database to obtain a plurality of sub-tree-structured databases, wherein one sub-tree-structured database corresponds to one target node;
  • Step S23 Determine the plurality of tag data based on the plurality of sub-tree structured databases, wherein the object tags in the tag data are semantic information in the corresponding sub-tree structured databases.
  • the multiple semantic databases may be tree-structured databases, where each node in the tree-structured database may represent a piece of semantic information, and each semantic information may represent corresponding object information.
  • each node in the tree-structured database can contain corresponding child nodes.
  • the hierarchical relationship between the node and the child nodes of the node constitutes the semantic information corresponding to the node and the corresponding child nodes. The hierarchical information among the semantic information.
  • the fusion semantic database of the tree structure may also contain a plurality of nodes, and each node may contain a corresponding child node, and each node is used to represent semantic information in the fusion semantic database.
  • the node corresponding to each semantic category can be determined in the fusion semantic database in tree structure.
  • multiple semantic categories can be person, food, location, bird, reptile, mammal, insect, fish, clothing, device, structure, vehicle, flower, herb, tree, fruit.
  • the node corresponding to each semantic category in the fusion semantic database of the tree structure can be determined.
  • the multiple semantic categories are person, food, and location.
  • it can be determined that the nodes corresponding to each semantic category are node A, node B, and node C. Among them, node A, node B, and node C are the above-mentioned Multiple target nodes.
  • each target node may be used as a root node to divide the tree-structured database, so as to obtain multiple sub-tree-structured databases.
  • the semantic information contained in the subtree-structured database can be determined as the object label in the corresponding label data, and the subtree
  • the hierarchical information between the semantic information contained in the database of the shape structure is determined as the hierarchical information between the object tags contained in the corresponding tag data.
  • the number and names of the semantic categories to be divided can be determined according to the actual needs of the test model, and are not specifically limited here.
  • the fusion semantic database is divided into label data corresponding to multiple semantic categories, and then multiple test data sets are determined according to the multiple label data, and the test model that can be tested can be obtained.
  • the test data set is comprehensively represented, and when the model is tested according to the multiple test data sets, the performance of the model to be tested on each semantic category can be determined.
  • step S105 when the preset data set contains a plurality of data and data tags of the plurality of data; for the above step S105, based on the preset data set, at least partially (For example, each) the object tag of the tag data is determined to match the data, which specifically includes the following steps:
  • Step S1051 Determine the object tags included in the tag data
  • Step S1052 Match the data tags in the preset data set with the object tags to determine at least one set of matching tags
  • Step S1053 Determining at least one piece of data corresponding to data tags in at least one group (for example, each group) of matching tags in the preset data set, and determining the corresponding at least one piece of data as matching the group Labels match the object labels in the data.
  • the preset data set may be a collection of natural pictures.
  • the preset data set may also be a set containing other types of data, which will not be described in detail in this disclosure.
  • the object tags included in each tag data are first determined, and then the data tags included in the preset data set are matched with the object tags to obtain at least one set of matching tags.
  • the process of matching the object labels in the label data with the data labels in the preset data set can be understood as comparing the semantic information corresponding to the object label with the semantic information corresponding to the data label.
  • the matching is successful.
  • the successfully matched object tags and data tags can form a set of matching tags.
  • the same semantic information can be understood as the object label is bike, and the data label is bike; the semantic information is similar can be understood as the object label is bike, and the data label is bicycle.
  • the object label bike and the data label bicycle are different, the objects represented by bike and bicycle are the same. Therefore, in the embodiments of the present disclosure, similar semantic information may be interpreted as object tags and data tags corresponding to the same object.
  • the data corresponding to the data tags in each group of matching tags in the preset data set can be determined, and then the data can be used as the object in the group of matching tags label to match the data.
  • the matching data of the object tag in each tag data can be determined through the above processing method. After obtaining the matching data of object tags in each tag data, the set of matching data of all object tags in each tag data can be used as the test data set corresponding to the tag data. At this time, you can get Multiple test data sets.
  • the above preset data set may be selected as the following two data sets: ImageNet and Places. Since the data sets ImageNet and Places contain a large number of natural pictures, when multiple test data sets are determined based on the data set ImageNet and Places, a more comprehensive data set can be obtained, and the test model is treated according to the multiple test data sets. When testing, the performance of the model under test on each semantic category can be determined.
  • the embodiment of the present disclosure also includes the following steps:
  • Step S11 performing test processing on the model to be tested through at least one (for example, each) test data set to obtain multiple test results;
  • Step S12 Calculate the average value of the plurality of test results, and determine the average value as the test result of the omnidirectional representation test on the model to be tested.
  • the obtained multiple test data sets may be respectively input into the model to be tested for test processing.
  • the model to be tested can obtain a test result on each test data set.
  • the average value of the obtained multiple test results can be calculated to obtain the test result of the all-round representation test of the model to be tested.
  • each test result can be used to reflect the performance of the model to be tested under the corresponding semantic category, for example, when the test result is greater than a certain threshold, it can be determined that the model to be tested is in the semantic category When the following data is obtained, better processing results can be obtained.
  • the omnidirectional representation of the model to be tested can be determined quantitatively, thereby determining the robustness of the model to be tested.
  • relevant technical personnel can also be instructed to carry out targeted training on the model to be tested, so that the model to be tested can obtain better processing results in the test data under each semantic category.
  • the disclosed method also includes the following steps:
  • Step S21 In the case where no data matching the target object label in the target label data is determined in the preset data set, determine the target semantic category corresponding to the target label data;
  • Step S22 Search for a matching database that matches the target semantic category in the candidate database, and search for data that matches the target object label in the matching database.
  • the data matching the target object label in the target label data cannot be determined in the preset data set, it can be searched in the alternative database according to the semantic category corresponding to the target label data.
  • the semantic category is matched with a matching database, and data matching the target object label is found in the matching database.
  • the alternative database refers to a database other than the above-mentioned preset data set.
  • the alternative database can be the matching data obtained by searching the network according to the semantic information corresponding to the semantic category or the target object label.
  • the alternative database It can also provide users with matching data based on semantic categories and semantic information.
  • the disclosed method also includes the following steps:
  • Step S31 When the target data label is determined in the preset data set, based on the hierarchical information between the data labels in the preset data set, determine the upper level label of the target data label;
  • the target data label is a data label that does not contain a corresponding object label among the object labels of multiple label data;
  • Step S32 Determine the semantic information corresponding to the upper-level label, and determine the semantic information matching the semantic information corresponding to the upper-level label in the plurality of label data;
  • Step S33 Add the semantic information corresponding to the target data tag as new semantic information to the semantic information of the next level of the matched semantic information, and set the new semantic information based on the preset data set The information identified matches the data.
  • the upper level of the target data tag can be determined according to the hierarchical information between the data tags in the preset data set label, and then determine the semantic information corresponding to the upper-level label, for example, the semantic information is marked as M.
  • the semantic information matching the semantic information M can be determined in multiple tag data, which is recorded as the semantic information N.
  • the semantic information corresponding to the target data label in the preset data set is added as new semantic information to the semantic information of the next level of semantic information N, and the semantic information corresponding to the target data label in the preset data set data as matching data for new semantic information.
  • the semantic information corresponding to the object tags in multiple tag data is supplemented by the data tags in the preset data set, which can enrich the semantic information in the tag data and obtain more and more comprehensive fusion semantic databases.
  • the test accuracy of the model to be tested can be obtained.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the embodiment of the present disclosure also provides a data set determination method device corresponding to the data set determination method, because the problem-solving principle of the device in the embodiment of the present disclosure is consistent with the determination of the above-mentioned data set in the embodiment of the present disclosure
  • the method is similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
  • FIG. 4 it is a schematic diagram of a device for determining a data set provided by an embodiment of the present disclosure.
  • the device includes: an acquisition unit module 41, a creation unit module 42, and a determination unit module 43; wherein,
  • An acquisition unit module 41 configured to acquire a semantic database comprising a plurality of semantic information
  • the creating unit module 42 is used to create a plurality of label data based on the semantic database; one label data corresponds to a semantic category, and the label data includes object labels belonging to the corresponding semantic category; the semantic category corresponding to the multiple label data A category that can perform comprehensive representation tests on the model to be tested;
  • the determining unit module 43 is configured to determine matching data for at least part (eg each) of the object tags of the tag data based on a preset data set, and determine at least part (eg each) of the tags based on the matching data
  • the test data sets respectively correspond to the data, and a plurality of test data sets are obtained.
  • the data corresponding to multiple semantic categories can be obtained.
  • the test data set when the performance test of the model to be tested is performed through the determined multiple test data sets, can realize the comprehensive test of the model to be tested, so as to obtain the comprehensive performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, and then the model processing accuracy of the model to be tested can be improved.
  • creating a unit module is also used to: fuse semantic information in multiple semantic databases to obtain a fusion semantic database; wherein, the fusion semantic database includes multiple fusion semantic information and multiple Fusing hierarchical information between semantic information; determining multiple semantic categories to be divided, and dividing the fused semantic database into the multiple label data according to the multiple semantic categories.
  • the creating unit module is further configured to: determine the semantic information to be fused in the first semantic database of the plurality of semantic databases; the semantic information to be fused is not in the first semantic database Including the semantic information of the next level; based on the hierarchical information among the semantic information in the first semantic database, determine the semantic path where the semantic information to be fused is located, and the semantic path contains at least one semantic information; based on the semantic path In the high-level semantic information before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused to obtain the fused semantic database, and the second semantic database is the A database other than the first semantic database among the plurality of semantic databases.
  • the creating unit module is further configured to: determine the target semantic information in the high-level semantic information according to the hierarchical order from high to low; the target semantic information is stored in the second semantic database contains the corresponding semantic information; the semantic information to be fused and the semantic information of the next level of the semantic information corresponding to the target semantic information in the second semantic database are fused to obtain the fused semantic database .
  • the creation unit module is further used for: determining nodes corresponding to at least part (for example, each) semantic category in the database of the tree structure to obtain a plurality of target nodes;
  • the target node is used as the root node, and the database of the tree structure is divided, and the database of multiple sub-tree structures is obtained by dividing, wherein, a database of a sub-tree structure corresponds to a target node; based on the multiple sub-tree structures
  • the database determines the plurality of label data, wherein the object label in the label data is semantic information in the database corresponding to the sub-tree structure.
  • the determining unit module is further configured to: determine object tags included in the tag data; match the data tags in the preset data set with the object tags, and determine at least one set of matching Label; determine at least one data corresponding to the data label in at least one group (for example, each group) of matching labels in the preset data set, and determine the corresponding at least one data as matching the group of labels The data that matches the object labels in .
  • the determining unit module is further configured to: perform test processing on the model to be tested through at least one (for example, each) test data set to obtain multiple test results; calculate the multiple test results The average value of the results, and the average value is determined as the test result of the full-scale representation test on the model to be tested.
  • the determining unit module is further configured to: if no matching data is determined in the preset data set that matches the target object tag in the target tag data, determine Corresponding to the target semantic category; searching for a matching database matching the target semantic category in the candidate database, and looking for data matching the target object label in the matching database.
  • the determining unit module is further configured to: in the case that the target data tag is determined in the preset data set, determine based on the hierarchical information between the data tags in the preset data set The upper-level label of the target data label; the target data label is a data label that does not contain a corresponding object label in the object labels of multiple label data; determine the semantic information corresponding to the upper-level label, and Determining the semantic information that matches the semantic information corresponding to the upper-level label among the plurality of label data; adding the semantic information corresponding to the target data label as new semantic information to the matched semantic information In the semantic information of the next level of the information, and based on the preset data set, matching data is determined for the new semantic information.
  • the embodiment of the present disclosure also provides a computer device 500, as shown in FIG. 5, which is a schematic structural diagram of the computer device 500 provided by the embodiment of the present disclosure, including:
  • processor 51 memory 52, and bus 53; memory 52 is used for storing and executing instruction, comprises memory 521 and external memory 522; memory 521 here is also called internal memory, is used for temporarily storing computing data in processor 51, and The data exchanged by the external memory 522 such as hard disk, the processor 51 exchanges data with the external memory 522 through the memory 521, and when the computer device 500 is running, the processor 51 communicates with the memory 52 through the bus 53, so that The processor 51 executes the following instructions:
  • a tag data corresponds to a semantic category, and the tag data includes object tags belonging to the corresponding semantic category;
  • the orientation indicates the category of the test;
  • matching data is determined for at least part (for example, each) of the object tags of the tag data, and a test data set corresponding to at least part (for example, each) of the tag data is determined based on the matching data. , to get multiple test data sets.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method for determining a data set described in the above-mentioned method embodiments are executed .
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the method for determining the data set described in the above method embodiment, for details, please refer to The foregoing method embodiments are not described in detail here.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data set determination method and apparatus, and a computer device and a storage medium. The method comprises: acquiring a semantic database, which includes a plurality of pieces of semantic information (S101); creating a plurality of pieces of tag data on the basis of the semantic database, wherein one piece of tag data corresponds to one semantic category, the tag data includes an object tag belonging to a corresponding semantic category, and semantic categories corresponding to the plurality of pieces of tag data are categories by which an omni-vision representation test can be performed on a model to be tested (S103); and on the basis of a preset data set, determining matching data for object tags of at least part of the tag data, and on the basis of the matching data, determining test data sets respectively corresponding to the at least part of the tag data, so as to obtain a plurality of test data sets (S105).

Description

一种数据集的确定方法、装置、计算机设备以及存储介质Method, device, computer equipment and storage medium for determining a data set
本申请要求在2021年8月26日提交中国专利局、申请号为202110986886.1、申请名称为“一种数据集的确定方法、装置、计算机设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on August 26, 2021, with the application number 202110986886.1, and the title of the application is "a method, device, computer equipment, and storage medium for determining a data set", all of which The contents are incorporated by reference in this application.
技术领域technical field
本公开涉及计算机技术领域,具体而言,涉及一种数据集的确定方法、装置、计算机设备以及存储介质。The present disclosure relates to the field of computer technology, and in particular, to a method, device, computer equipment, and storage medium for determining a data set.
背景技术Background technique
在计算机视觉领域,需要对设计好的模型进行性能测试,此时,可以根据相应的测试集对设计好的模型进行性能测试。然而,现有的测试集通常为预先已经设定好的数据集,例如,ImageNet数据集等。由于现有的测试集中包含在各种场景下包含多种类型的物体的测试数据,因此,通过现有的测试集对模型进行测试时,无法反应出该模型针对各种类型的物体所对应的测试数据的测试性能。此时,在采用现有的测试集对模型进行性能测试时,将影响该模型的鲁棒性,从而影响该模型的处理精度。In the field of computer vision, it is necessary to test the performance of the designed model. At this time, the performance test of the designed model can be performed according to the corresponding test set. However, the existing test set is usually a pre-set data set, for example, ImageNet data set and so on. Since the existing test set contains test data containing multiple types of objects in various scenarios, when the model is tested through the existing test set, it cannot reflect the corresponding performance of the model for various types of objects. Test performance on test data. At this time, when using the existing test set to test the performance of the model, the robustness of the model will be affected, thereby affecting the processing accuracy of the model.
发明内容Contents of the invention
本公开实施例至少提供一种数据集的确定方法、装置、计算机设备以及存储介质。Embodiments of the present disclosure at least provide a method, device, computer equipment, and storage medium for determining a data set.
第一方面,本公开实施例提供了一种数据集的确定方法,包括:获取包含多个语义信息的语义数据库;基于所述语义数据库创建多个标签数据;一个标签数据对应一个语义类别,所述标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别;基于预设数据集合,为至少部分所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分所述标签数据分别对应的测试数据集合,得到多个测试数据集合。In the first aspect, an embodiment of the present disclosure provides a method for determining a data set, including: acquiring a semantic database containing multiple semantic information; creating multiple label data based on the semantic database; one label data corresponds to one semantic category, and the The tag data includes object tags belonging to the corresponding semantic category; the semantic category corresponding to the plurality of tag data is a category that can perform a full range of representation tests on the model to be tested; based on the preset data set, it is at least part of the tag data The object tag determines matching data, and based on the matching data, determines test data sets corresponding to at least part of the tag data, to obtain multiple test data sets.
通过上述描述可知,本公开实施例通过对语义数据库进行处理得到对应多个语义类别的标签数据,并基于确定出的多个标签数据创建对应多个语义类别的测试数据集合的方式,可以得到对应多个语义类别的测试数据集合,在通过确定出的多个测试数据集合对待测试模型进行性能测试时,可以实现全方位对待测试模型进行测试,从而得到待测试模型的全方位表示性能。通过该测试方式,可以提高待测试模型的鲁棒性,进而提高待测试模型的模型处理精度。It can be seen from the above description that the embodiments of the present disclosure process the semantic database to obtain label data corresponding to multiple semantic categories, and create test data sets corresponding to multiple semantic categories based on the determined multiple label data. The test data sets of multiple semantic categories, when the performance test of the model to be tested is performed through the determined multiple test data sets, the model to be tested can be tested in an all-round way, so as to obtain the all-round performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
一种可选的实施方式中,所述语义数据库为多个,所述基于所述语义数据库创建多个标签数据,包括:将多个语义数据库中的语义信息进行融合,得到融合语义数据库;其中,所述融合语义数据库中包含多个融合语义信息和多个融合语义信息之间的层次信息;确定待划分的多个语义类别,并按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据。In an optional implementation manner, there are multiple semantic databases, and creating multiple label data based on the semantic databases includes: fusing semantic information in multiple semantic databases to obtain a fusion semantic database; wherein , the fusion semantic database includes a plurality of fusion semantic information and hierarchical information between the plurality of fusion semantic information; determine a plurality of semantic categories to be divided, and divide the fusion semantic database according to the plurality of semantic categories The plurality of label data.
通过上述描述可知,通过将多个语义数据库进行语义融合,可以得到更加全面的语义数据库,即融合语义数据库。在根据该融合语义数据库确定多个标签数据时,就可以 得到语义类别更加丰富的标签数据,通过该多个标签数据所对应测试数据集合对待测试模型进行测试时,可以实现待测试模型的全方位测试,从而得到待测试模型的全方位表示性能。From the above description, it can be known that a more comprehensive semantic database, that is, a fusion semantic database, can be obtained by performing semantic fusion of multiple semantic databases. When multiple label data are determined according to the fusion semantic database, label data with more abundant semantic categories can be obtained. When the test model is tested through the test data set corresponding to the multiple label data, the full range of the test model can be realized. Test, so as to obtain the full range of representation performance of the model to be tested.
一种可选的实施方式中,所述将多个语义数据库中的语义信息进行融合,得到融合语义数据库,包括:在所述多个语义数据库的第一语义数据库中确定待融合语义信息;所述待融合语义信息在所述第一语义数据库中不包含下一层级的语义信息;基于所述第一语义数据库中语义信息间的层次信息,确定所述待融合语义信息所在的语义路径,所述语义路径包含至少一个语义信息;基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,所述第二语义数据库为所述多个语义数据库中除所述第一语义数据库之外的数据库。In an optional implementation manner, the merging the semantic information in multiple semantic databases to obtain the fused semantic database includes: determining the semantic information to be fused in the first semantic database of the multiple semantic databases; The semantic information to be fused does not contain the semantic information of the next level in the first semantic database; based on the hierarchical information between the semantic information in the first semantic database, determine the semantic path where the semantic information to be fused is located, so The semantic path includes at least one semantic information; based on the high-level semantic information in the semantic path before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused to obtain the The fusion semantic database, the second semantic database is a database other than the first semantic database among the plurality of semantic databases.
通过上述描述可知,通过基于语义信息之间的层次信息确定待融合语义信息所在的语义路径,进而根据该语义路径将待融合语义信息和第二语义数据库中的语义信息进行融合的方式,可以更加快速准确的确定出待融合语义信息和第二语义数据库中语义信息之间的映射关系,从而能够实现最大可能将每个待融合语义信息和第二语义数据库中的语义信息进行融合,进而得到包含更加全面的语义信息的融合语义数据库。From the above description, it can be seen that by determining the semantic path where the semantic information to be fused is located based on the hierarchical information between the semantic information, and then according to the semantic path, the semantic information to be fused and the semantic information in the second semantic database are fused. Quickly and accurately determine the mapping relationship between the semantic information to be fused and the semantic information in the second semantic database, so as to realize the maximum possible fusion of each semantic information to be fused with the semantic information in the second semantic database, and then obtain the contained More comprehensive semantic information fusion semantic database.
一种可选的实施方式中,所述基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,包括:按照由高到低的层次顺序,在所述高层次语义信息中确定目标语义信息;所述目标语义信息在所述第二语义数据库中包含相对应的语义信息;将所述待融合语义信息和所述第二语义数据库中与所述目标语义信息相对应的语义信息的下一层次的语义信息进行融合,得到所述融合语义数据库。In an optional implementation manner, the semantic information to be fused and the semantic information in the second semantic database are fused based on the high-level semantic information in the semantic path before the semantic information to be fused, Obtaining the fused semantic database includes: determining target semantic information in the high-level semantic information in order of levels from high to low; the target semantic information includes corresponding semantic information in the second semantic database ; Fusing the semantic information to be fused with the semantic information of the next level of the semantic information corresponding to the target semantic information in the second semantic database to obtain the fused semantic database.
在本公开实施例中,通过将多个语义数据库中的语义信息进行融合,得到融合语义数据库的方式,可以得到包含更加丰富、更加全面的语义信息,在基于该融合语义数据库确定多个标签数据时,就可以得到对应多种语义类型的标签数据,从而实现对待测试模型进行全方位表示测试,进而提高待测试模型的鲁棒性,同时提高该待测试模型的适用范围,以提高该待测试模型的处理精度。In the embodiment of the present disclosure, by merging the semantic information in multiple semantic databases to obtain the fused semantic database, richer and more comprehensive semantic information can be obtained, and multiple tag data can be determined based on the fused semantic database. When , you can get label data corresponding to multiple semantic types, so as to realize the all-round representation test of the model to be tested, thereby improving the robustness of the model to be tested, and at the same time improving the scope of application of the model to be tested, so as to improve the performance of the model to be tested. The processing accuracy of the model.
一种可选的实施方式中,所述融合语义数据库为树形结构的数据库;所述按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据,包括:在所述树形结构的数据库中确定与至少部分(例如每个)语义类别相对应的节点,得到多个目标节点;将每个所述目标节点作为根节点,对所述树形结构的数据库进行划分,划分得到多个子树形结构的数据库,其中,一个子树形结构的数据库对应一个目标节点;基于所述多个子树形结构的数据库确定所述多个标签数据,其中,所述标签数据中的物体标签为对应子树形结构的数据库中的语义信息。In an optional implementation manner, the fused semantic database is a tree-structured database; dividing the fused semantic database into the plurality of label data according to the plurality of semantic categories includes: Determining nodes corresponding to at least part (for example, each) semantic category in the tree-structured database to obtain a plurality of target nodes; using each of the target nodes as a root node to divide the tree-structured database, Dividing databases with a plurality of sub-tree structures, wherein a database with a sub-tree structure corresponds to a target node; determining the plurality of label data based on the database with a sub-tree structure, wherein the label data The object label is the semantic information in the database corresponding to the sub-tree structure.
在本公开实施例中,根据需要划分的语义类别,将融合语义数据库划分为对应多个语义类别的标签数据,再根据该多个标签数据确定多个测试数据集合,可以得到能够对待测试模型进行全方位表示测试的数据集合,在根据该多个测试数据集合进行模型测试时,可以确定出待测试模型在每个语义类别上的性能表现。In the embodiment of the present disclosure, according to the semantic categories that need to be divided, the fusion semantic database is divided into label data corresponding to multiple semantic categories, and then multiple test data sets are determined according to the multiple label data, and the test model that can be tested can be obtained. The test data set is comprehensively represented, and when the model is tested according to the multiple test data sets, the performance of the model to be tested on each semantic category can be determined.
一种可选的实施方式中,所述预设数据集合中包含多个数据和多个数据的数据标签;所述基于预设数据集合,为至少部分(例如各个)所述标签数据的物体标签确定相匹配数据,包括:确定所述标签数据中所包含的物体标签;将所述预设数据集合中的数据标签与所述物体标签进行匹配,确定至少一组匹配标签;在所述预设数据集合中确定与至少一组(例如每组)匹配标签中的数据标签相对应的至少一个数据,并将所述相对应的至少一个数据确定为与该组匹配标签中的物体标签相匹配的数据。In an optional implementation manner, the preset data set includes a plurality of data and data tags of the plurality of data; the preset data set is based on object tags of at least part (for example, each) of the tag data Determining matching data includes: determining object tags included in the tag data; matching data tags in the preset data set with the object tags to determine at least one set of matching tags; Determining at least one piece of data corresponding to a data label in at least one set (for example, each set) of matching labels in the data set, and determining the corresponding at least one piece of data as matching the object label in the set of matching labels data.
在本公开实施例中,上述预设数据集合可以选择为以下两个数据集:ImageNet和Places。由于数据集ImageNet和Places中包含大量的自然图片,因此,在基于数据集ImageNet和Places来确定多个测试数据集合时,可以得到更加全面的数据集合,在根据该多个测试数据集合对待测试模型进行测试时,可以确定出待测试模型在至少部分(例如每个)语义类别上的性能表现。In the embodiment of the present disclosure, the above preset data set may be selected as the following two data sets: ImageNet and Places. Since the data sets ImageNet and Places contain a large number of natural pictures, when multiple test data sets are determined based on the data set ImageNet and Places, a more comprehensive data set can be obtained, and the test model is treated according to the multiple test data sets. When testing, the performance of the model to be tested on at least some (eg, each) semantic categories can be determined.
一种可选的实施方式中,所述方法还包括:通过至少一个测试数据集合分别对所述待测试模型进行测试处理,得到多个测试结果;计算所述多个测试结果的平均值,并将所述平均值确定为对所述待测试模型进行全方位表示测试的测试结果。In an optional implementation manner, the method further includes: performing test processing on the model to be tested through at least one test data set to obtain multiple test results; calculating the average value of the multiple test results, and The average value is determined as a test result of an all-round representation test on the model to be tested.
在本公开实施例中,通过对待测试模型在多个测试数据集上进行测试,得到多个测试结果,再对多个测试结果进行平均值计算,得到对待测试模型进行全方位表示测试的测试结果的方式,可以通过量化的方式确定待测试模型的全方位表示,从而确定该待测试模型的鲁棒性。通过确定上述测试结果,还可以指导相关技术人员对该待测试模型进行针对性训练,从而使得该待测试模型能够在至少部分(例如每个)语义类别下的测试数据中均得到较好的处理结果。In the embodiment of the present disclosure, by testing the model to be tested on multiple test data sets, multiple test results are obtained, and then the average value of the multiple test results is calculated to obtain the test result of the comprehensive representation test of the model to be tested In this way, the omnidirectional representation of the model to be tested can be determined quantitatively, thereby determining the robustness of the model to be tested. By determining the above test results, relevant technical personnel can also be instructed to carry out targeted training on the model to be tested, so that the model to be tested can be better processed in the test data under at least part (for example, each) semantic category result.
一种可选的实施方式中,所述方法还包括:在所述预设数据集合中未确定出与目标标签数据中的目标物体标签相匹配数据的情况下,确定所述目标标签数据所对应目标语义类别;在备选数据库中查找与所述目标语义类别相匹配的匹配数据库,并在所述匹配数据库中查找与所述目标物体标签相匹配数据。In an optional implementation manner, the method further includes: in the case that no matching data is determined in the preset data set that matches the target object tag in the target tag data, determining the object tag corresponding to the target tag data The target semantic category; searching for a matching database matching the target semantic category in the candidate database, and looking for data matching the target object label in the matching database.
通过上述处理方式,可以得到更加全面的测试数据集合,在根据该测试数据集合对待测试模型进行全方位测试时,可以得到更加准确的测试结果。Through the above processing method, a more comprehensive test data set can be obtained, and more accurate test results can be obtained when the model to be tested is fully tested according to the test data set.
一种可选的实施方式中,所述方法还包括:在所述预设数据集合中确定出目标数据标签的情况下,基于所述预设数据集合中数据标签之间的层次信息,确定所述目标数据标签的上一层次标签;所述目标数据标签为在多个标签数据的物体标签中不包含对应物体标签的数据标签;确定所述上一层次标签所对应的语义信息,并在所述多个标签数据中确定与所述上一层次标签所对应的语义信息相匹配的语义信息;将所述目标数据标签所对应的语义信息作为新语义信息,添加至所述相匹配的语义信息的下一层次的语义信息中,并基于所述预设数据集合为所述新语义信息确定相匹配数据。In an optional implementation manner, the method further includes: when a target data label is determined in the preset data set, determine the target data label based on hierarchical information between data labels in the preset data set The upper-level label of the target data label; the target data label is a data label that does not contain the corresponding object label in the object labels of multiple label data; determine the semantic information corresponding to the upper-level label, and in the Determine the semantic information that matches the semantic information corresponding to the upper-level label among the plurality of label data; add the semantic information corresponding to the target data label as new semantic information to the matched semantic information In the semantic information of the next level, and based on the preset data set, matching data is determined for the new semantic information.
在本公开实施中,通过预设数据集中的数据标签对多个标签数据中的物体标签所对应的语义信息进行补充,可以丰富标签数据中的语义信息,得到更多更全面的融合语义数据库,从而可以得到待测试模型的测试准确度。In the implementation of the present disclosure, the semantic information corresponding to the object tags in multiple tag data is supplemented by the data tags in the preset data set, which can enrich the semantic information in the tag data and obtain more and more comprehensive fusion semantic databases. Thus, the test accuracy of the model to be tested can be obtained.
第二方面,本公开实施例还提供一种数据集的确定装置,包括:获取单元,用于获取包含多个语义信息的语义数据库;创建单元,用于基于所述语义数据库创建多个标签 数据;一个标签数据对应一个语义类别,所述标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别;确定单元,用于基于预设数据集合,为至少部分所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分所述标签数据分别对应的测试数据集合,得到多个测试数据集合。In the second aspect, an embodiment of the present disclosure further provides a data set determination device, including: an acquisition unit, configured to acquire a semantic database containing multiple semantic information; a creation unit, configured to create multiple tag data based on the semantic database ; A label data corresponds to a semantic category, and the label data includes an object label belonging to the corresponding semantic category; the semantic category corresponding to the plurality of label data is a category that can perform a full range of representation tests on the model to be tested; determine the unit, use Determining matching data for object tags of at least part of the tag data based on the preset data set, and determining test data sets respectively corresponding to at least part of the tag data based on the matching data, to obtain multiple test data sets.
第三方面,本公开实施例还提供一种计算机设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.
第五方面,本公开可选实现方式还提供一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In the fifth aspect, an optional implementation manner of the present disclosure further provides a computer program product, including computer-readable codes, or a computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are processed in an electronic device When running in the processor, the processor in the electronic device executes the above first aspect, or the steps in any possible implementation manner of the first aspect.
在本公开实施例中,首先,获取包含多个语义信息的语义数据库,之后,可以基于语义数据库创建多个标签数据,并基于预设数据集合,为至少部分(例如各个)标签数据的物体标签确定相匹配数据,进而得到多个测试数据集合。通过上述描述可知,本公开实施例通过对语义数据库进行处理得到对应多个语义类别的标签数据,并基于确定出的多个标签数据创建对应多个语义类别的测试数据集合的方式,可以得到对应多个语义类别的测试数据集合,在通过确定出的多个测试数据集合对待测试模型进行性能测试时,可以实现全方位对待测试模型进行测试,从而得到待测试模型的全方位表示性能。通过该测试方式,可以提高待测试模型的鲁棒性,进而提高待测试模型的模型处理精度。In the embodiment of the present disclosure, firstly, a semantic database containing multiple semantic information is obtained, and then multiple label data can be created based on the semantic database, and based on a preset data set, object labels for at least part of (for example, each) label data Determine the matching data, and then obtain multiple test data sets. It can be seen from the above description that the embodiments of the present disclosure process the semantic database to obtain label data corresponding to multiple semantic categories, and create test data sets corresponding to multiple semantic categories based on the determined multiple label data. The test data sets of multiple semantic categories, when the performance test of the model to be tested is performed through the determined multiple test data sets, the model to be tested can be tested in an all-round way, so as to obtain the all-round performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例所提供的一种数据集的确定方法的流程图;FIG. 1 shows a flowchart of a method for determining a data set provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供一种树形结构的第一语义数据库的结构示意图;FIG. 2 shows a schematic structural diagram of a tree-structured first semantic database provided by an embodiment of the present disclosure;
图3示出了本公开实施例所提供的数据集的确定方法中,基于预设数据集合,为各个所述标签数据的物体标签确定相匹配数据的具体步骤的流程图;FIG. 3 shows a flow chart of specific steps for determining matching data for object tags of each tag data based on a preset data set in the method for determining a data set provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的一种数据集的确定装置的示意图;FIG. 4 shows a schematic diagram of an apparatus for determining a data set provided by an embodiment of the present disclosure;
图5示出了本公开实施例所提供的一种计算机设备的示意图。Fig. 5 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
经研究发现,现有的测试集通常为预先已经设定好的数据集,例如,ImageNet数据集等。由于现有的测试集中包含在各种场景下包含多种类型的物体的测试数据,因此,通过现有的测试集对模型进行测试时,无法反应出该模型针对各种类型的物体所对应的测试数据的测试性能。此时,在采用现有的测试集对模型进行性能测试时,将影响该模型的鲁棒性,从而影响该模型的处理精度。It is found through research that the existing test set is usually a pre-set data set, for example, the ImageNet data set and so on. Since the existing test set contains test data containing multiple types of objects in various scenarios, when the model is tested through the existing test set, it cannot reflect the corresponding performance of the model for various types of objects. Test performance on test data. At this time, when using the existing test set to test the performance of the model, the robustness of the model will be affected, thereby affecting the processing accuracy of the model.
基于上述研究,本公开提供了一种数据集的确定方法、装置、计算机设备以及存储介质。通过上述描述可知,本公开实施例通过对语义数据库进行处理得到对应多个语义类别的标签数据,并基于确定出的多个标签数据创建对应多个语义类别的测试数据集合的方式,可以得到对应多个语义类别的测试数据集合,在通过确定出的多个测试数据集合对待测试模型进行性能测试时,可以实现全方位对待测试模型进行测试,从而得到待测试模型的全方位表示性能。通过该测试方式,可以提高待测试模型的鲁棒性,进而提高待测试模型的模型处理精度。Based on the above research, the present disclosure provides a method, device, computer equipment and storage medium for determining a data set. It can be seen from the above description that the embodiments of the present disclosure process the semantic database to obtain label data corresponding to multiple semantic categories, and create test data sets corresponding to multiple semantic categories based on the determined multiple label data. The test data sets of multiple semantic categories, when the performance test of the model to be tested is performed through the determined multiple test data sets, the model to be tested can be tested in an all-round way, so as to obtain the all-round performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种数据集的确定方法进行详细介绍,本公开实施例所提供的数据集的确定方法的执行主体一般为具有一定计算能力的计算机设备,在一些可能的实现方式中,该数据集的确定方法可以通过处理器运行计算机可执行代码的方式执行。In order to facilitate the understanding of this embodiment, a method for determining a data set disclosed in an embodiment of the present disclosure is firstly introduced in detail. The method for determining a data set provided by an embodiment of the present disclosure is generally executed by a computer with certain computing power device, in some possible implementation manners, the method for determining the data set may be executed by a processor running computer executable codes.
参见图1所示,为本公开实施例提供的一种数据集的确定方法的流程图,所述方法包括步骤S101~S105,其中:Referring to FIG. 1 , which is a flow chart of a method for determining a data set provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:
S101:获取包含多个语义信息的语义数据库。S101: Acquire a semantic database including multiple semantic information.
这里,语义数据库中包含的多个语义信息可以用来表示各种实体的信息,这里,实 体的信息也可以称为物体的概念信息。Here, multiple semantic information contained in the semantic database can be used to represent the information of various entities, and here, the information of entities can also be called the conceptual information of objects.
这里,语义信息可以是中文信息,也可以是外文信息,本公开对此不作具体限定。例如,语义信息可以是中文信息,语义信息还可以是英文信息。例如,语义信息可以是猫、狗、行人、汽车等信息,也可以是cat、domenstic cat、person等信息。Here, the semantic information may be Chinese information or foreign language information, which is not specifically limited in the present disclosure. For example, the semantic information may be Chinese information, and the semantic information may also be English information. For example, semantic information can be cat, dog, pedestrian, car, etc., or cat, domestic cat, person, etc.
在本公开实施例中,语义数据库中除了包含多个语义信息之外,还可以包含多个语义信息之间的层次信息,其中,该层次信息用于表征多个语义信息之间的所属关系(或者上下级关系)。In the embodiment of the present disclosure, in addition to multiple semantic information, the semantic database may also contain hierarchical information between multiple semantic information, where the hierarchical information is used to represent the ownership relationship between multiple semantic information ( or superior-subordinate relationship).
例如,多个语义信息包含哺乳动物、爬行动物、老虎、狗、蛇、蜥蜴等信息。此时,哺乳动物、爬行动物等信息可以作为一个层次的语义信息。此时,老虎、狗等语义信息则属于对应哺乳动物这一类别的下一层次的语义信息。此时,蛇、蜥蜴等语义信息属于对应爬行动物这一类别的下一层次的语义信息。此时,哺乳动物与老虎、狗;爬行动物与蛇、蜥蜴等信息之间的关系就构成了语义数据库中的层次信息(也即所属关系或者上下级关系)。For example, a plurality of semantic information includes information on mammals, reptiles, tigers, dogs, snakes, lizards, and the like. At this time, information such as mammals and reptiles can be used as a level of semantic information. At this time, semantic information such as tigers and dogs belong to the next level of semantic information corresponding to the category of mammals. At this time, semantic information such as snakes and lizards belong to the next level of semantic information corresponding to the category of reptiles. At this time, the relationship between mammals and tigers, dogs; reptiles and snakes, lizards and other information constitutes the hierarchical information in the semantic database (that is, the affiliation relationship or the superior-subordinate relationship).
在本公开实施例中,获取到的语义数据库的数量可以为多个,本公开对获取到的多个语义数据库的数量不作具体限定。例如,获取到的语义数据库的数量可以为2个,也可以为3个,4个等,本公开不作具体限定。In the embodiments of the present disclosure, the number of acquired semantic databases may be multiple, and the present disclosure does not specifically limit the number of acquired semantic databases. For example, the number of acquired semantic databases may be 2, 3, 4, etc., which is not specifically limited in the present disclosure.
示例性地,获取到的多个语义数据库的数量可以为2个,并且这2个语义数据库中的语义信息都可以用来表征自然环境中物体。例如,这2个语义数据库可以为Wordnet语义数据库和Wikidata语义数据库。除此之外,多个语义数据库还可以选择为其他类型的数据库,本公开不再一一列举。Exemplarily, the number of acquired multiple semantic databases may be two, and the semantic information in the two semantic databases can be used to represent objects in the natural environment. For example, the two semantic databases can be Wordnet semantic database and Wikidata semantic database. In addition, the multiple semantic databases can also be selected as other types of databases, which will not be listed in this disclosure.
S103:基于语义数据库创建多个标签数据;一个标签数据对应一个语义类别,例如,每个标签数据可对应一个语义类别,每个标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别。S103: Create multiple label data based on the semantic database; one label data corresponds to one semantic category, for example, each label data can correspond to one semantic category, and each label data contains object labels belonging to the corresponding semantic category; the multiple labels The semantic category corresponding to the data is the category that can perform comprehensive representation tests on the model to be tested.
通过上述描述可知,语义数据库中包含多个语义信息,其中,该多个语义信息所属于多个语义类别,例如,多个语义类别可以为person,food,location,bird,reptile,mammal,insect,fish,clothing,device,structure,vehicle,flower,herb,tree,fruit。It can be seen from the above description that the semantic database contains multiple semantic information, wherein the multiple semantic information belongs to multiple semantic categories, for example, the multiple semantic categories can be person, food, location, bird, reptile, mammal, insect, fish, clothing, device, structure, vehicle, flower, herb, tree, fruit.
这里,通过设置上述多个语义类别,可以实现对待测试模型进行全方位表示(omni-vision representation)测试。全方位表示测试用于表征通过尽可能多的语义类别下的测试数据(例如,自然图片)对待测试模型进行性能测试,从而得到待测试模型在每个语义类别下的测试数据的性能测试结果。Here, by setting the above multiple semantic categories, omni-vision representation testing of the model to be tested can be realized. The omnidirectional representation test is used to characterize the performance test of the model to be tested by testing the test data (for example, natural pictures) under as many semantic categories as possible, so as to obtain the performance test results of the test data of the model to be tested under each semantic category.
此时,可以基于语义数据库创建多个标签数据,每个标签数据对应上述多个语义类别中的一个语义类别。例如,多个标签数据包含:标签数据1、标签数据2和标签数据3,其中,该标签数据1对应语义类别flower;该标签数据2对应语义类别food;该标签数据3对应语义类别location等。At this point, multiple tag data can be created based on the semantic database, and each tag data corresponds to one of the above multiple semantic categories. For example, the plurality of label data includes: label data 1, label data 2 and label data 3, wherein the label data 1 corresponds to the semantic category flower; the label data 2 corresponds to the semantic category food; the label data 3 corresponds to the semantic category location, etc.
针对每个标签数据包含对应语义类别的物体标签,例如,针对标签数据1,包含所属于语义类别“flower”的物体标签,例如,该物体标签可以为“rose(玫瑰花)”、“jasmine(茉莉花)”等物体标签。For each label data, an object label corresponding to a semantic category is included. For example, for label data 1, an object label belonging to the semantic category "flower" is included. For example, the object label can be "rose (rose)", "jasmine ( Jasmine)" and other object labels.
在本公开实施例中,每个标签数据中的物体标签可以理解为语义数据库中对应语义类别下的语义信息。In the embodiment of the present disclosure, the object label in each label data can be understood as the semantic information under the corresponding semantic category in the semantic database.
S105:基于预设数据集合,为至少部分(例如各个)所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分(例如每个)所述标签数据分别对应的测试数据集合,得到多个测试数据集合。S105: Based on the preset data set, determine matching data for at least part (for example, each) of the object tags of the tag data, and determine corresponding tests for at least part (for example, each) of the tag data based on the matching data Data set, get multiple test data sets.
本公开实施例通过对语义数据库进行处理得到对应多个语义类别的标签数据,并基于确定出的多个标签数据创建对应多个语义类别的测试数据集合的方式,可以得到对应多个语义类别的测试数据集合,在通过确定出的多个测试数据集合对待测试模型进行性能测试时,可以实现全方位对待测试模型进行测试,从而得到待测试模型的全方位表示性能。通过该测试方式,可以提高待测试模型的鲁棒性,进而提高待测试模型的模型处理精度。In the embodiment of the present disclosure, by processing the semantic database to obtain label data corresponding to multiple semantic categories, and creating test data sets corresponding to multiple semantic categories based on the determined multiple label data, the data corresponding to multiple semantic categories can be obtained. The test data set, when the performance test of the model to be tested is performed through the determined multiple test data sets, can realize the comprehensive test of the model to be tested, so as to obtain the comprehensive performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, thereby improving the model processing accuracy of the model to be tested.
在一个可选的实施方式中,在语义数据库为多个的情况下,针对S103,基于所述语义数据库创建多个标签数据,具体包括如下过程:In an optional implementation manner, in the case of multiple semantic databases, for S103, multiple label data are created based on the semantic database, specifically including the following process:
步骤S1031:将多个语义数据库中的语义信息进行融合,得到融合语义数据库;其中,所述融合语义数据库中包含多个融合语义信息和多个融合语义信息之间的层次信息;Step S1031: Fusing semantic information in multiple semantic databases to obtain a fused semantic database; wherein, the fused semantic database includes multiple fused semantic information and hierarchical information between multiple fused semantic information;
步骤S1032:确定待划分的多个语义类别,并按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据。Step S1032: Determine a plurality of semantic categories to be divided, and divide the fusion semantic database into the plurality of label data according to the plurality of semantic categories.
在语义数据库的数量为多个时,可以将多个语义数据库中的语义信息进行融合,可以得到融合语义数据库;之后,可以根据待划分的多个语义类别对融合语义数据库进行划分,划分得到多个标签数据。When the number of semantic databases is multiple, the semantic information in multiple semantic databases can be fused to obtain the fused semantic database; after that, the fused semantic database can be divided according to the multiple semantic categories to be divided, and multiple divisions can be obtained. label data.
在本公开实施例中,可以从多个语义数据库中选择一个语义数据库作为基准语义数据库。然后,建立该基准语义数据库中的语义信息和多个语义数据库中剩余语义数据库中语义信息之间的语义映射关系,进而根据该语义映射关系将多个语义数据库中的语义信息进行融合,得到融合语义数据库。In the embodiment of the present disclosure, one semantic database may be selected from multiple semantic databases as the reference semantic database. Then, the semantic mapping relationship between the semantic information in the benchmark semantic database and the semantic information in the remaining semantic databases in the multiple semantic databases is established, and then the semantic information in the multiple semantic databases is fused according to the semantic mapping relationship to obtain the fusion semantic database.
示例性地,当获取的多个语义数据库的数量为2个时,这两个语义数据库可以为Wordnet语义数据库以及Wikidata语义数据库,此时,可以选定Wikidata作为基准语义数据库,Wordnet即为多个语义数据库中剩余语义数据库。Exemplarily, when the number of multiple semantic databases acquired is 2, the two semantic databases can be Wordnet semantic database and Wikidata semantic database. At this time, Wikidata can be selected as the benchmark semantic database, and Wordnet is multiple The remaining semantic databases in the semantic database.
这里,可以基于基准语义数据库中不包含下一层级语义信息的语义信息在该基准语义数据库中的语义路径,建立上述语义映射关系。Here, the above-mentioned semantic mapping relationship may be established based on the semantic path of the semantic information in the benchmark semantic database that does not contain the semantic information of the next level in the benchmark semantic database.
在选择基准语义数据库时,可以从多个语义数据库中将对应较多数量的概念信息(语义信息)的语义数据库确定为基准语义数据库。When selecting a benchmark semantic database, a semantic database corresponding to a larger amount of conceptual information (semantic information) may be determined as a benchmark semantic database from among multiple semantic databases.
通过上述描述可知,通过将多个语义数据库进行语义融合,可以得到更加全面的语义数据库,即融合语义数据库。在根据该融合语义数据库确定多个标签数据时,就可以得到语义类别更加丰富的标签数据,通过该多个标签数据所对应测试数据集合对待测试模型进行测试时,可以实现待测试模型的全方位测试,从而得到待测试模型的全方位表示性能。From the above description, it can be known that a more comprehensive semantic database, that is, a fusion semantic database, can be obtained by performing semantic fusion of multiple semantic databases. When multiple label data are determined according to the fusion semantic database, label data with more abundant semantic categories can be obtained. When the test model is tested through the test data set corresponding to the multiple label data, the full range of the test model can be realized. Test, so as to obtain the full range of representation performance of the model to be tested.
在一个可选的实施方式中,针对S1031,将多个语义数据库中的语义信息进行融合,得到融合语义数据库,具体包括如下步骤:In an optional implementation manner, for S1031, the semantic information in multiple semantic databases is fused to obtain the fused semantic database, which specifically includes the following steps:
步骤S11:在所述多个语义数据库的第一语义数据库中确定待融合语义信息;所述待融合语义信息在所述第一语义数据库中不包含下一层级的语义信息;Step S11: determining the semantic information to be fused in the first semantic database of the plurality of semantic databases; the semantic information to be fused does not include semantic information of the next level in the first semantic database;
步骤S12:基于所述第一语义数据库中语义信息间的层次信息,确定所述待融合语义信息所在的语义路径,所述语义路径包含至少一个语义信息;Step S12: Based on the hierarchical information among the semantic information in the first semantic database, determine the semantic path where the semantic information to be fused is located, and the semantic path contains at least one semantic information;
步骤S13:基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,所述第二语义数据库为所述多个语义数据库中除所述第一语义数据库之外的数据库。Step S13: Based on the high-level semantic information in the semantic path before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused to obtain the fused semantic database. The second semantic database is a database other than the first semantic database among the plurality of semantic databases.
在本公开实施例中,从多个语义数据库中选择一个或多个语义数据库作为第一语义数据库。这里的第一语义数据库即为上述所描述的基准语义数据库,此时,可以将多个语义数据库中将对应较多数量的概念信息(语义信息)的语义数据库确定为第一语义数据库。In the embodiment of the present disclosure, one or more semantic databases are selected from multiple semantic databases as the first semantic database. The first semantic database here is the reference semantic database described above. At this time, the semantic database corresponding to a larger amount of concept information (semantic information) among the plurality of semantic databases may be determined as the first semantic database.
在确定出第一语义数据库之后,可以根据第一语义数据库中所包含的语义信息之间的层次信息,在第一语义数据库中确定待融合语义信息。这里,可以将第一语义数据库中不包含下一层级的语义信息确定为待融合语义信息。After the first semantic database is determined, the semantic information to be fused can be determined in the first semantic database according to the hierarchical information among the semantic information contained in the first semantic database. Here, the semantic information of the next level not included in the first semantic database may be determined as the semantic information to be fused.
例如,如图2所示。如图2所示的为树形结构的第一语义数据库,从如图2所示的第一语义数据库可知,该第一语义数据库包含:节点1和节点2,其中,节点1包含节点11至节点14,节点2包含节点21至节点23,节点11包含节点111和节点112,此时,节点12至节点14,节点21至节点23,以及节点111和节点112所对应的语义信息不包含下一层级的语义信息,此时,可以将上述节点所对应的语义信息确定为待融合语义信息。For example, as shown in Figure 2. As shown in Figure 2, it is the first semantic database of tree structure, as can be seen from the first semantic database shown in Figure 2, this first semantic database includes: node 1 and node 2, wherein, node 1 includes node 11 to Node 14, node 2 includes node 21 to node 23, node 11 includes node 111 and node 112, at this time, node 12 to node 14, node 21 to node 23, and the semantic information corresponding to node 111 and node 112 do not include the following One level of semantic information. In this case, the semantic information corresponding to the above nodes can be determined as the semantic information to be fused.
之后,就可以确定每个待融合语义信息在第一语义数据库中所在语义路径。例如,针对图2中的“节点111”,该节点111所对应的待融合语音信息所对应的语义路径可以为:节点1-节点11-节点111。Afterwards, the semantic path of each semantic information to be fused in the first semantic database can be determined. For example, for "node 111" in FIG. 2, the semantic path corresponding to the speech information to be fused corresponding to the node 111 may be: node 1-node 11-node 111.
此时,就可以根据该语义路径中位于该待融合语义信息之间的高层次语义信息,将待融合语义信息和第二语义数据库中的语义信息进行融合。例如,可以根据“节点1”所对应的语义信息和“节点11”所对应的语义信息,将“节点111”所对应的待融合语义信息和第二语义数据库中的语义信息进行融合。At this point, the semantic information to be fused and the semantic information in the second semantic database can be fused according to the high-level semantic information located between the semantic information to be fused in the semantic path. For example, according to the semantic information corresponding to "node 1" and the semantic information corresponding to "node 11", the semantic information to be fused corresponding to "node 111" may be fused with the semantic information in the second semantic database.
在一个可能的实施方式中,在多个语义数据库的数量大于2个的情况下,可以按照上述所描述的方式从多个语义数据库中确定一个第一语义数据库,然后,将该第一语义数据库中的待融合语义信息分别与剩余的语义数据库(即,第二语义数据库)中的语义信息进行融合,具体融合过程为上述步骤S11至步骤S13所描述的过程,此处不再一一赘述。In a possible implementation, when the number of multiple semantic databases is greater than 2, a first semantic database can be determined from the multiple semantic databases in the manner described above, and then the first semantic database The semantic information to be fused in is respectively fused with the semantic information in the remaining semantic database (ie, the second semantic database). The specific fusion process is the process described in the above steps S11 to S13, which will not be repeated here.
通过上述描述可知,通过基于语义信息之间的层次信息确定待融合语义信息所在的语义路径,进而根据该语义路径将待融合语义信息和第二语义数据库中的语义信息进行 融合的方式,可以更加快速准确的确定出待融合语义信息和第二语义数据库中语义信息之间的映射关系,从而能够实现最大可能将每个待融合语义信息和第二语义数据库中的语义信息进行融合,进而得到包含更加全面的语义信息的融合语义数据库。From the above description, it can be seen that by determining the semantic path where the semantic information to be fused is located based on the hierarchical information between the semantic information, and then according to the semantic path, the semantic information to be fused and the semantic information in the second semantic database are fused. Quickly and accurately determine the mapping relationship between the semantic information to be fused and the semantic information in the second semantic database, so as to realize the maximum possible fusion of each semantic information to be fused with the semantic information in the second semantic database, and then obtain the contained More comprehensive semantic information fusion semantic database.
在一个可选的实施方式中,针对S13,基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,包括如下步骤:In an optional implementation manner, for S13, based on the high-level semantic information in the semantic path before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused , to obtain the fusion semantic database, comprising the following steps:
(1)、按照由高到低的层次顺序,在所述高层次语义信息中确定目标语义信息;所述目标语义信息在所述第二语义数据库中包含相对应的语义信息;(1) Determine the target semantic information in the high-level semantic information according to the hierarchical order from high to low; the target semantic information includes corresponding semantic information in the second semantic database;
(2)、将所述待融合语义信息和所述第二语义数据库中与所述目标语义信息相对应的语义信息的下一层次的语义信息进行融合,得到所述融合语义数据库。(2) Fusing the semantic information to be fused with the semantic information of the next level of the semantic information corresponding to the target semantic information in the second semantic database to obtain the fused semantic database.
在本公开实施例中,得到待融合语义信息的语义路径后,就可以得到第一语义数据库中位于待融合语义信息之前的高层次语义信息,例如,如图2中所示的“节点1”所对应的语义信息和“节点11”所对应的语义信息。此时,可以将得到的高层次语义信息按照从高到低的层次顺序,在高层次语义信息中确定目标语义信息,具体过程描述如下:In the embodiment of the present disclosure, after obtaining the semantic path of the semantic information to be fused, the high-level semantic information in the first semantic database before the semantic information to be fused can be obtained, for example, "node 1" as shown in FIG. 2 The corresponding semantic information and the semantic information corresponding to "node 11". At this point, the obtained high-level semantic information can be determined in order of high-level semantic information from high-level to low-level, and the target semantic information can be determined in the high-level semantic information. The specific process is described as follows:
首先,根据语义路径,确定待融合语义信息的上一层次的语义信息,然后,判断第二语义数据库中是否包含与该上一层次的语义信息相对应的语义信息。在判断出包含的情况下,将该上一层次的语义信息确定为目标语义信息。在判断出不包含的情况下,继续确定该上一层次的语义信息的上一层次的语义信息,并判断该第二语义数据库中是否包含与该上一层次的语义信息的上一层次的语义信息相对应的语义信息。在判断出包含的情况下,将该上一层次的语义信息的上一层次的语义信息确定为目标语义信息,否则,沿着语义路径继续向上查找高层次语义信息。Firstly, according to the semantic path, determine the upper level semantic information of the semantic information to be fused, and then judge whether the second semantic database contains the semantic information corresponding to the upper level semantic information. If it is determined that it is contained, the upper-level semantic information is determined as the target semantic information. In the case of judging that it does not contain, continue to determine the semantic information of the upper level of the semantic information of the upper level, and judge whether the second semantic database contains the semantic information of the upper level of the semantic information of the upper level The semantic information corresponding to the information. If it is determined that it is included, determine the semantic information of the upper level of the upper level semantic information as the target semantic information, otherwise, continue to search for higher level semantic information along the semantic path.
假设,多个语义数据库包含Wikidata数据库和Wordnet数据库。这里,可以选择第一语义数据库为Wikidata数据库,第二语义数据库为Wordnet数据库。Suppose, multiple semantic databases include Wikidata database and Wordnet database. Here, the first semantic database may be selected as the Wikidata database, and the second semantic database may be selected as the Wordnet database.
首先,从Wikidata语义数据库中选择待融合语义信息,该待融合语义信息不包含下一层次的语义信息,例如,该待融合语义信息可以为Toyger信息,之后可以确定Toyger信息在Wikidata语义数据库中的语义路径,例如,该语义路径为Toyger-Domestic Cat-Cat。First, select the semantic information to be fused from the Wikidata semantic database. The semantic information to be fused does not contain the semantic information of the next level. For example, the semantic information to be fused can be Toyger information, and then the Toyger information in the Wikidata semantic database can be determined. A semantic path, for example, the semantic path is Toyger-Domestic Cat-Cat.
在得到上述语义路径之后,可以确定Toyger信息的高层次语义信息,例如,分别为Domestic Cat信息和Cat信息。根据得到的高层次语义信息,按照从高到低的层次顺序(或者理解为从下往上的层次顺序)可以确定目标语义信息,例如,该目标语义信息为Domestic Cat信息。此时,该目标语义信息在Wordnet语义数据库中所对应的语义信息也为Domestic Cat信息。此时,可以将Wikidata语义数据库中的Toyger信息(待融合语义信息)和Wordnet语义数据库中的Domestic Cat信息的下一层次的语义信息进行融合。After obtaining the above semantic path, the high-level semantic information of Toyger information can be determined, for example, Domestic Cat information and Cat information respectively. According to the obtained high-level semantic information, the target semantic information can be determined according to the hierarchical order from high to low (or understood as the hierarchical order from bottom to top). For example, the target semantic information is Domestic Cat information. At this time, the semantic information corresponding to the target semantic information in the Wordnet semantic database is also Domestic Cat information. At this point, the Toyger information (semantic information to be fused) in the Wikidata semantic database can be fused with the next-level semantic information of the Domestic Cat information in the Wordnet semantic database.
针对Wikidata语义数据库中的每个待融合语义信息,均可以采用上述所描述的方式,将待融合语义信息和Wordnet语义数据库中的语义信息进行融合。在对每个待融合语义信息进行融合之后,可以得到相应的融合语义数据库。For each semantic information to be fused in the Wikidata semantic database, the manner described above can be used to fuse the semantic information to be fused with the semantic information in the Wordnet semantic database. After fusing each semantic information to be fused, a corresponding fused semantic database can be obtained.
在本公开实施例中,当获取的多个语义数据库的数量大于2个时,假设可以选择第 N个语义数据库作为第一语义数据库,然后,从剩余的N-1个语义数据库中任意选择一个语义数据库作为第二语义数据库,此时,可以从第一语义数据库中选择待融合语义信息,并将待融合语义信息与第二语义数据库中的语义信息进行融合,从而完成两个语义数据库的融合,得到融合语义数据库M。之后,再从剩余的N-2个语义数据库中选择一个语义数据库作为第一语义数据库,上述语义数据库M作为第二语义数据库进行语义信息的融合,以此类推,直到完成所有获取的语义数据库中语义信息的融合,得到最终的融合语义数据库。In the embodiment of the present disclosure, when the number of acquired multiple semantic databases is greater than 2, it is assumed that the Nth semantic database can be selected as the first semantic database, and then one of the remaining N-1 semantic databases is arbitrarily selected The semantic database is used as the second semantic database. At this time, the semantic information to be fused can be selected from the first semantic database, and the semantic information to be fused is fused with the semantic information in the second semantic database, thereby completing the fusion of the two semantic databases , get the fusion semantic database M. After that, select a semantic database from the remaining N-2 semantic databases as the first semantic database, and the above-mentioned semantic database M is used as the second semantic database to carry out the fusion of semantic information, and so on, until all acquired semantic databases are completed. The fusion of semantic information results in the final fusion semantic database.
在本公开实施例中,通过将多个语义数据库中的语义信息进行融合,得到融合语义数据库的方式,可以得到包含更加丰富、更加全面的语义信息,在基于该融合语义数据库确定多个标签数据时,就可以得到对应多种语义类型的标签数据,从而实现对待测试模型进行全方位表示测试,进而提高待测试模型的鲁棒性,同时提高该待测试模型的适用范围,以提高该待测试模型的处理精度。In the embodiment of the present disclosure, by merging the semantic information in multiple semantic databases to obtain the fused semantic database, richer and more comprehensive semantic information can be obtained, and multiple tag data can be determined based on the fused semantic database. When , you can get label data corresponding to multiple semantic types, so as to realize the all-round representation test of the model to be tested, thereby improving the robustness of the model to be tested, and at the same time improving the scope of application of the model to be tested, so as to improve the performance of the model to be tested. The processing accuracy of the model.
在一个可选的实施方式中,在融合语义数据库为树形结构的数据库的情况下,针对S1032,按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据,具体包括如下步骤:In an optional implementation manner, when the fused semantic database is a tree-structured database, for S1032, divide the fused semantic database into the multiple label data according to the multiple semantic categories, specifically including Follow the steps below:
步骤S21:在所述树形结构的数据库中确定与至少部分(例如每个)语义类别相对应的节点,得到多个目标节点;Step S21: Determining nodes corresponding to at least some (for example, each) semantic categories in the tree-structured database to obtain a plurality of target nodes;
步骤S22:将所述目标节点作为根节点,对所述树形结构的数据库进行划分,划分得到多个子树形结构的数据库,其中,一个子树形结构的数据库对应一个目标节点;Step S22: Using the target node as a root node, divide the tree-structured database to obtain a plurality of sub-tree-structured databases, wherein one sub-tree-structured database corresponds to one target node;
步骤S23:基于所述多个子树形结构的数据库确定所述多个标签数据,其中,所述标签数据中的物体标签为对应子树形结构的数据库中的语义信息。Step S23: Determine the plurality of tag data based on the plurality of sub-tree structured databases, wherein the object tags in the tag data are semantic information in the corresponding sub-tree structured databases.
在本公开实施例中,多个语义数据库可以为树形结构的数据库,其中,树形结构的数据库中的每个节点可以代表一个语义信息,每个语义信息可以代表相应的物体信息。此时,树形结构的数据库中的每个节点可以包含对应的子节点,此时,该节点与该节点的子节点之间的层级关系就构成了该节点所对应的语义信息和子节点所对应的语义信息之间的层次信息。In an embodiment of the present disclosure, the multiple semantic databases may be tree-structured databases, where each node in the tree-structured database may represent a piece of semantic information, and each semantic information may represent corresponding object information. At this time, each node in the tree-structured database can contain corresponding child nodes. At this time, the hierarchical relationship between the node and the child nodes of the node constitutes the semantic information corresponding to the node and the corresponding child nodes. The hierarchical information among the semantic information.
在按照上述所描述的方式对多个语义数据库进行融合,得到融合语义数据库之后,同样可以得到一个树形结构的融合语义数据库。因此,该树形结构的融合语义数据库中同样可以包含多个节点,每个节点可以包含对应的子节点,每个节点用于表征融合语义数据库中的语义信息。After merging multiple semantic databases in the manner described above to obtain the fused semantic database, a tree-structured fused semantic database can also be obtained. Therefore, the fusion semantic database of the tree structure may also contain a plurality of nodes, and each node may contain a corresponding child node, and each node is used to represent semantic information in the fusion semantic database.
这里,在确定出待划分的多个语义类别之后,可以在树形结构的融合语义数据库中确定每个语义类别所对应的节点。例如,多个语义类别可以为person,food,location,bird,reptile,mammal,insect,fish,clothing,device,structure,vehicle,flower,herb,tree,fruit。此时,可以确定每个语义类别在该树形结构的融合语义数据库中所对应的节点。例如,多个语义类别为person,food,location,此时,可以确定出每个语义类别所对应的节点为节点A、节点B、节点C,其中,节点A、节点B、节点C即为上述多个目标节点。Here, after the multiple semantic categories to be divided are determined, the node corresponding to each semantic category can be determined in the fusion semantic database in tree structure. For example, multiple semantic categories can be person, food, location, bird, reptile, mammal, insect, fish, clothing, device, structure, vehicle, flower, herb, tree, fruit. At this point, the node corresponding to each semantic category in the fusion semantic database of the tree structure can be determined. For example, the multiple semantic categories are person, food, and location. At this time, it can be determined that the nodes corresponding to each semantic category are node A, node B, and node C. Among them, node A, node B, and node C are the above-mentioned Multiple target nodes.
在确定出多个目标节点之后,可以将每个目标节点作为根节点,对树形结构的数据库进行划分,从而划分得到多个子树形结构的数据库。After a plurality of target nodes are determined, each target node may be used as a root node to divide the tree-structured database, so as to obtain multiple sub-tree-structured databases.
在得到多个子树形结构的数据库之后,针对每个子树形结构的数据库,可以将该子树形结构的数据库中所包含的语义信息确定为对应标签数据中的物体标签,并将该子树形结构的数据库所包含的语义信息之间层次信息,确定为对应标签数据中所包含物体标签之间的层次信息。After obtaining multiple subtree-structured databases, for each subtree-structured database, the semantic information contained in the subtree-structured database can be determined as the object label in the corresponding label data, and the subtree The hierarchical information between the semantic information contained in the database of the shape structure is determined as the hierarchical information between the object tags contained in the corresponding tag data.
这里,待划分的语义类别的数量和名称可以根据测试模型的实际需要进行确定,此处不作具体限定。Here, the number and names of the semantic categories to be divided can be determined according to the actual needs of the test model, and are not specifically limited here.
在本公开实施例中,根据需要划分的语义类别,将融合语义数据库划分为对应多个语义类别的标签数据,再根据该多个标签数据确定多个测试数据集合,可以得到能够对待测试模型进行全方位表示测试的数据集合,在根据该多个测试数据集合进行模型测试时,可以确定出待测试模型在每个语义类别上的性能表现。In the embodiment of the present disclosure, according to the semantic categories that need to be divided, the fusion semantic database is divided into label data corresponding to multiple semantic categories, and then multiple test data sets are determined according to the multiple label data, and the test model that can be tested can be obtained. The test data set is comprehensively represented, and when the model is tested according to the multiple test data sets, the performance of the model to be tested on each semantic category can be determined.
在一个可选的实施方式中,如图3所示,在预设数据集合中包含多个数据和多个数据的数据标签的情况下;针对上述步骤S105,基于预设数据集合,为至少部分(例如各个)所述标签数据的物体标签确定相匹配数据,具体包括如下步骤:In an optional implementation, as shown in FIG. 3 , when the preset data set contains a plurality of data and data tags of the plurality of data; for the above step S105, based on the preset data set, at least partially (For example, each) the object tag of the tag data is determined to match the data, which specifically includes the following steps:
步骤S1051:确定所述标签数据中所包含的物体标签;Step S1051: Determine the object tags included in the tag data;
步骤S1052:将所述预设数据集合中的数据标签与所述物体标签进行匹配,确定至少一组匹配标签;Step S1052: Match the data tags in the preset data set with the object tags to determine at least one set of matching tags;
步骤S1053:在所述预设数据集合中确定与至少一组(例如每组)匹配标签中的数据标签相对应的至少一个数据,并将所述相对应的至少一个数据确定为与该组匹配标签中的物体标签相匹配的数据。Step S1053: Determining at least one piece of data corresponding to data tags in at least one group (for example, each group) of matching tags in the preset data set, and determining the corresponding at least one piece of data as matching the group Labels match the object labels in the data.
这里,预设数据集合可以为自然图片集合,除此之外,该预设数据集合还可以为包含其他类型数据的集合,本公开对此不再详细描述。Here, the preset data set may be a collection of natural pictures. In addition, the preset data set may also be a set containing other types of data, which will not be described in detail in this disclosure.
在本公开实施例中,首先确定每个标签数据中包含的物体标签,之后,将预设数据集合中包含的数据标签与物体标签进行匹配,得到至少一组匹配标签。In the embodiment of the present disclosure, the object tags included in each tag data are first determined, and then the data tags included in the preset data set are matched with the object tags to obtain at least one set of matching tags.
这里,将标签数据中的物体标签与预设数据集合中的数据标签进行匹配的过程,可以理解为将物体标签所对应的语义信息和数据标签对应的语义信息进行比对,当语义信息相同或者相近时表示匹配成功,此时,匹配成功的物体标签和数据标签则可以构成一组匹配标签。Here, the process of matching the object labels in the label data with the data labels in the preset data set can be understood as comparing the semantic information corresponding to the object label with the semantic information corresponding to the data label. When the semantic information is the same or When they are similar, the matching is successful. At this time, the successfully matched object tags and data tags can form a set of matching tags.
语义信息相同可以理解为物体标签为bike,且数据标签为bike;语义信息相近可以理解为物体标签为bike,且数据标签为bicycle。这里,虽然物体标签bike和数据标签bicycle不相同,但是bike和bicycle所表示的物体是相同的。因此,在本公开实施例中,语义信息相近可以理解为对应相同物体的物体标签和数据标签。The same semantic information can be understood as the object label is bike, and the data label is bike; the semantic information is similar can be understood as the object label is bike, and the data label is bicycle. Here, although the object label bike and the data label bicycle are different, the objects represented by bike and bicycle are the same. Therefore, in the embodiments of the present disclosure, similar semantic information may be interpreted as object tags and data tags corresponding to the same object.
在按照上述所描述的方式得到至少一组匹配标签之后,就可以确定每组匹配标签中的数据标签在预设数据集合中所对应的数据,进而将该数据作为与该组匹配标签中的物体标签相匹配数据。After obtaining at least one group of matching tags in the manner described above, the data corresponding to the data tags in each group of matching tags in the preset data set can be determined, and then the data can be used as the object in the group of matching tags label to match the data.
通过上述处理方式就可以确定每个标签数据中物体标签的相匹配数据。在得到的每个标签数据中物体标签的相匹配数据之后,就可以将每个标签数据中全部物体标签的相匹配数据的集合作为该标签数据所对应的测试数据集合,此时,就可以得到多个测试数据集合。The matching data of the object tag in each tag data can be determined through the above processing method. After obtaining the matching data of object tags in each tag data, the set of matching data of all object tags in each tag data can be used as the test data set corresponding to the tag data. At this time, you can get Multiple test data sets.
在本公开实施例中,上述预设数据集合可以选择为以下两个数据集:ImageNet和Places。由于数据集ImageNet和Places中包含大量的自然图片,因此,在基于数据集ImageNet和Places来确定多个测试数据集合时,可以得到更加全面的数据集合,在根据该多个测试数据集合对待测试模型进行测试时,可以确定出待测试模型在每个语义类别上的性能表现。In the embodiment of the present disclosure, the above preset data set may be selected as the following two data sets: ImageNet and Places. Since the data sets ImageNet and Places contain a large number of natural pictures, when multiple test data sets are determined based on the data set ImageNet and Places, a more comprehensive data set can be obtained, and the test model is treated according to the multiple test data sets. When testing, the performance of the model under test on each semantic category can be determined.
在一个可选的实施方式中,本公开实施例还包括如下步骤:In an optional implementation manner, the embodiment of the present disclosure also includes the following steps:
步骤S11:通过至少一个(例如每个)测试数据集合分别对所述待测试模型进行测试处理,得到多个测试结果;Step S11: performing test processing on the model to be tested through at least one (for example, each) test data set to obtain multiple test results;
步骤S12:计算所述多个测试结果的平均值,并将所述平均值确定为对所述待测试模型进行全方位表示测试的测试结果。Step S12: Calculate the average value of the plurality of test results, and determine the average value as the test result of the omnidirectional representation test on the model to be tested.
在本公开实施例中,可以将得到的多个测试数据集合分别输入到待测试模型中进行测试处理。待测试模型在每个测试数据集合上均可以得到一个测试结果。此时,可以计算得到的多个测试结果的平均值,得到对该待测试模型进行全方位表示测试的测试结果。In the embodiment of the present disclosure, the obtained multiple test data sets may be respectively input into the model to be tested for test processing. The model to be tested can obtain a test result on each test data set. At this time, the average value of the obtained multiple test results can be calculated to obtain the test result of the all-round representation test of the model to be tested.
在本公开实施例中,每个测试结果可以用于反映该待测试模型在对应语义类别下的表现情况,例如,当测试结果大于某个阈值,则可以确定该待测试模型在处于该语义类别下的数据时,可以得到较好的处理结果。In the embodiment of the present disclosure, each test result can be used to reflect the performance of the model to be tested under the corresponding semantic category, for example, when the test result is greater than a certain threshold, it can be determined that the model to be tested is in the semantic category When the following data is obtained, better processing results can be obtained.
在本公开实施例中,通过对待测试模型在多个测试数据集上进行测试,得到多个测试结果,再对多个测试结果进行平均值计算,得到对待测试模型进行全方位表示测试的测试结果的方式,可以通过量化的方式确定待测试模型的全方位表示,从而确定该待测试模型的鲁棒性。通过确定上述测试结果,还可以指导相关技术人员对该待测试模型进行针对性训练,从而使得该待测试模型能够在每个语义类别下的测试数据中均得到较好的处理结果。In the embodiment of the present disclosure, by testing the model to be tested on multiple test data sets, multiple test results are obtained, and then the average value of the multiple test results is calculated to obtain the test result of the comprehensive representation test of the model to be tested In this way, the omnidirectional representation of the model to be tested can be determined quantitatively, thereby determining the robustness of the model to be tested. By determining the above test results, relevant technical personnel can also be instructed to carry out targeted training on the model to be tested, so that the model to be tested can obtain better processing results in the test data under each semantic category.
在一个可选的实施方式中,本公开方法还包括如下步骤:In an optional embodiment, the disclosed method also includes the following steps:
步骤S21:在所述预设数据集合中未确定出与目标标签数据中的目标物体标签相匹配数据的情况下,确定所述目标标签数据所对应目标语义类别;Step S21: In the case where no data matching the target object label in the target label data is determined in the preset data set, determine the target semantic category corresponding to the target label data;
步骤S22:在备选数据库中查找与所述目标语义类别相匹配的匹配数据库,并在所述匹配数据库中查找与所述目标物体标签相匹配数据。Step S22: Search for a matching database that matches the target semantic category in the candidate database, and search for data that matches the target object label in the matching database.
在本公开实施例中,当预设数据集合中未能确定出与目标标签数据中的目标物体标签相匹配的数据时,可以根据目标标签数据所对应的语义类别,在备选数据库中查找与该语义类别相匹配的匹配数据库,并在相匹配的数据库中寻找与目标物体标签相匹配的数据。In the embodiment of the present disclosure, when the data matching the target object label in the target label data cannot be determined in the preset data set, it can be searched in the alternative database according to the semantic category corresponding to the target label data. The semantic category is matched with a matching database, and data matching the target object label is found in the matching database.
这里,备选数据库是指除上述预设数据集合之外的数据库,例如,备选数据库可以 为根据语义类别或者目标物体标签对应的语义信息,在网络进行搜索得到的相匹配数据,备选数据库还可以为用户根据语义类别以及语义信息提供的相匹配的数据,这里对备选数据库不作具体限定,以能满足实际需求为主。Here, the alternative database refers to a database other than the above-mentioned preset data set. For example, the alternative database can be the matching data obtained by searching the network according to the semantic information corresponding to the semantic category or the target object label. The alternative database It can also provide users with matching data based on semantic categories and semantic information. Here, there is no specific limitation on the candidate databases, which mainly meet actual needs.
通过上述处理方式,可以得到更加全面的测试数据集合,在根据该测试数据集合对待测试模型进行全方位测试时,可以得到更加准确的测试结果。Through the above processing method, a more comprehensive test data set can be obtained, and more accurate test results can be obtained when the model to be tested is fully tested according to the test data set.
在一个可选的实施方式中,本公开方法还包括如下步骤:In an optional embodiment, the disclosed method also includes the following steps:
步骤S31:在所述预设数据集合中确定出目标数据标签的情况下,基于所述预设数据集合中数据标签之间的层次信息,确定所述目标数据标签的上一层次标签;所述目标数据标签为在多个标签数据的物体标签中不包含对应物体标签的数据标签;Step S31: When the target data label is determined in the preset data set, based on the hierarchical information between the data labels in the preset data set, determine the upper level label of the target data label; The target data label is a data label that does not contain a corresponding object label among the object labels of multiple label data;
步骤S32:确定所述上一层次标签所对应的语义信息,并在所述多个标签数据中确定与所述上一层次标签所对应的语义信息相匹配的语义信息;Step S32: Determine the semantic information corresponding to the upper-level label, and determine the semantic information matching the semantic information corresponding to the upper-level label in the plurality of label data;
步骤S33:将所述目标数据标签所对应的语义信息作为新语义信息,添加至所述相匹配的语义信息的下一层次的语义信息中,并基于所述预设数据集合为所述新语义信息确定相匹配数据。Step S33: Add the semantic information corresponding to the target data tag as new semantic information to the semantic information of the next level of the matched semantic information, and set the new semantic information based on the preset data set The information identified matches the data.
在本公开实施例中,若在多个标签数据中未找到与目标数据标签相匹配的物体标签,则可以根据预设数据集合中数据标签之间的层次信息,确定目标数据标签的上一层次的标签,进而,确定该上一层次的标签所对应的语义信息,例如,该语义信息记为M。之后,可以在多个标签数据中确定与语义信息M相匹配的语义信息,记为语义信息N。此时,将预设数据集合中目标数据标签所对应的语义信息作为新语义信息添加至语义信息N的下一层次的语义信息中,并将预设数据集合中与该目标数据标签所对应的数据作为新语义信息的相匹配数据。In the embodiment of the present disclosure, if no object tag matching the target data tag is found in multiple tag data, the upper level of the target data tag can be determined according to the hierarchical information between the data tags in the preset data set label, and then determine the semantic information corresponding to the upper-level label, for example, the semantic information is marked as M. Afterwards, the semantic information matching the semantic information M can be determined in multiple tag data, which is recorded as the semantic information N. At this time, the semantic information corresponding to the target data label in the preset data set is added as new semantic information to the semantic information of the next level of semantic information N, and the semantic information corresponding to the target data label in the preset data set data as matching data for new semantic information.
在本公开实施中,通过预设数据集中的数据标签对多个标签数据中的物体标签所对应的语义信息进行补充,可以丰富标签数据中的语义信息,得到更多更全面的融合语义数据库,从而可以得到待测试模型的测试准确度。In the implementation of the present disclosure, the semantic information corresponding to the object tags in multiple tag data is supplemented by the data tags in the preset data set, which can enrich the semantic information in the tag data and obtain more and more comprehensive fusion semantic databases. Thus, the test accuracy of the model to be tested can be obtained.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.
基于同一发明构思,本公开实施例中还提供了与数据集的确定方法对应的数据集的确定方法装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述数据集的确定方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides a data set determination method device corresponding to the data set determination method, because the problem-solving principle of the device in the embodiment of the present disclosure is consistent with the determination of the above-mentioned data set in the embodiment of the present disclosure The method is similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
参照图4所示,为本公开实施例提供的一种数据集的确定装置的示意图,所述装置包括:获取单元模块41、创建单元模块42、确定单元模块43;其中,Referring to FIG. 4 , it is a schematic diagram of a device for determining a data set provided by an embodiment of the present disclosure. The device includes: an acquisition unit module 41, a creation unit module 42, and a determination unit module 43; wherein,
获取单元模块41,用于获取包含多个语义信息的语义数据库;An acquisition unit module 41, configured to acquire a semantic database comprising a plurality of semantic information;
创建单元模块42,用于基于所述语义数据库创建多个标签数据;一个标签数据对应一个语义类别,所述标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别;The creating unit module 42 is used to create a plurality of label data based on the semantic database; one label data corresponds to a semantic category, and the label data includes object labels belonging to the corresponding semantic category; the semantic category corresponding to the multiple label data A category that can perform comprehensive representation tests on the model to be tested;
确定单元模块43,用于基于预设数据集合,为至少部分(例如各个)所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分(例如每个)所述标签数据分别对应的测试数据集合,得到多个测试数据集合。The determining unit module 43 is configured to determine matching data for at least part (eg each) of the object tags of the tag data based on a preset data set, and determine at least part (eg each) of the tags based on the matching data The test data sets respectively correspond to the data, and a plurality of test data sets are obtained.
本公开实施例通过对语义数据库进行处理得到对应多个语义类别的标签数据,并基于确定出的多个标签数据创建对应多个语义类别的测试数据集合的方式,可以得到对应多个语义类别的测试数据集合,在通过确定出的多个测试数据集合对待测试模型进行性能测试时,可以实现全方位对待测试模型进行测试,从而得到待测试模型的全方位表示性能。通过该测试方式,可以提高待测试模型的鲁棒性,进而提高待测试模型的模型处理精度。In the embodiment of the present disclosure, by processing the semantic database to obtain label data corresponding to multiple semantic categories, and creating test data sets corresponding to multiple semantic categories based on the determined multiple label data, the data corresponding to multiple semantic categories can be obtained. The test data set, when the performance test of the model to be tested is performed through the determined multiple test data sets, can realize the comprehensive test of the model to be tested, so as to obtain the comprehensive performance of the model to be tested. Through this testing method, the robustness of the model to be tested can be improved, and then the model processing accuracy of the model to be tested can be improved.
一种可能的实施方式中,创建单元模块,还用于:将多个语义数据库中的语义信息进行融合,得到融合语义数据库;其中,所述融合语义数据库中包含多个融合语义信息和多个融合语义信息之间的层次信息;确定待划分的多个语义类别,并按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据。In a possible implementation manner, creating a unit module is also used to: fuse semantic information in multiple semantic databases to obtain a fusion semantic database; wherein, the fusion semantic database includes multiple fusion semantic information and multiple Fusing hierarchical information between semantic information; determining multiple semantic categories to be divided, and dividing the fused semantic database into the multiple label data according to the multiple semantic categories.
一种可能的实施方式中,创建单元模块,还用于:在所述多个语义数据库的第一语义数据库中确定待融合语义信息;所述待融合语义信息在所述第一语义数据库中不包含下一层级的语义信息;基于所述第一语义数据库中语义信息间的层次信息,确定所述待融合语义信息所在的语义路径,所述语义路径包含至少一个语义信息;基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,所述第二语义数据库为所述多个语义数据库中除所述第一语义数据库之外的数据库。In a possible implementation manner, the creating unit module is further configured to: determine the semantic information to be fused in the first semantic database of the plurality of semantic databases; the semantic information to be fused is not in the first semantic database Including the semantic information of the next level; based on the hierarchical information among the semantic information in the first semantic database, determine the semantic path where the semantic information to be fused is located, and the semantic path contains at least one semantic information; based on the semantic path In the high-level semantic information before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database are fused to obtain the fused semantic database, and the second semantic database is the A database other than the first semantic database among the plurality of semantic databases.
一种可能的实施方式中,创建单元模块,还用于:按照由高到低的层次顺序,在所述高层次语义信息中确定目标语义信息;所述目标语义信息在所述第二语义数据库中包含相对应的语义信息;将所述待融合语义信息和所述第二语义数据库中与所述目标语义信息相对应的语义信息的下一层次的语义信息进行融合,得到所述融合语义数据库。In a possible implementation manner, the creating unit module is further configured to: determine the target semantic information in the high-level semantic information according to the hierarchical order from high to low; the target semantic information is stored in the second semantic database contains the corresponding semantic information; the semantic information to be fused and the semantic information of the next level of the semantic information corresponding to the target semantic information in the second semantic database are fused to obtain the fused semantic database .
一种可能的实施方式中,创建单元模块,还用于:在所述树形结构的数据库中确定与至少部分(例如每个)语义类别相对应的节点,得到多个目标节点;将所述目标节点作为根节点,对所述树形结构的数据库进行划分,划分得到多个子树形结构的数据库,其中,一个子树形结构的数据库对应一个目标节点;基于所述多个子树形结构的数据库确定所述多个标签数据,其中,所述标签数据中的物体标签为对应子树形结构的数据库中的语义信息。In a possible implementation manner, the creation unit module is further used for: determining nodes corresponding to at least part (for example, each) semantic category in the database of the tree structure to obtain a plurality of target nodes; The target node is used as the root node, and the database of the tree structure is divided, and the database of multiple sub-tree structures is obtained by dividing, wherein, a database of a sub-tree structure corresponds to a target node; based on the multiple sub-tree structures The database determines the plurality of label data, wherein the object label in the label data is semantic information in the database corresponding to the sub-tree structure.
一种可能的实施方式中,确定单元模块,还用于:确定标签数据中所包含的物体标签;将所述预设数据集合中的数据标签与所述物体标签进行匹配,确定至少一组匹配标签;在所述预设数据集合中确定与至少一组(例如每组)匹配标签中的数据标签相对应的至少一个数据,并将所述相对应的至少一个数据确定为与该组匹配标签中的物体标签相匹配的数据。In a possible implementation manner, the determining unit module is further configured to: determine object tags included in the tag data; match the data tags in the preset data set with the object tags, and determine at least one set of matching Label; determine at least one data corresponding to the data label in at least one group (for example, each group) of matching labels in the preset data set, and determine the corresponding at least one data as matching the group of labels The data that matches the object labels in .
一种可能的实施方式中,确定单元模块,还用于:通过至少一个(例如每个)测试数据集合分别对所述待测试模型进行测试处理,得到多个测试结果;计算所述多个测试结果的平均值,并将所述平均值确定为对所述待测试模型进行全方位表示测试的测试结 果。In a possible implementation manner, the determining unit module is further configured to: perform test processing on the model to be tested through at least one (for example, each) test data set to obtain multiple test results; calculate the multiple test results The average value of the results, and the average value is determined as the test result of the full-scale representation test on the model to be tested.
一种可能的实施方式中,确定单元模块,还用于:在所述预设数据集合中未确定出与目标标签数据中的目标物体标签相匹配数据的情况下,确定所述目标标签数据所对应目标语义类别;在备选数据库中查找与所述目标语义类别相匹配的匹配数据库,并在所述匹配数据库中查找与所述目标物体标签相匹配数据。In a possible implementation manner, the determining unit module is further configured to: if no matching data is determined in the preset data set that matches the target object tag in the target tag data, determine Corresponding to the target semantic category; searching for a matching database matching the target semantic category in the candidate database, and looking for data matching the target object label in the matching database.
一种可能的实施方式中,确定单元模块,还用于:在所述预设数据集合中确定出目标数据标签的情况下,基于所述预设数据集合中数据标签之间的层次信息,确定所述目标数据标签的上一层次标签;所述目标数据标签为在多个标签数据的物体标签中不包含对应物体标签的数据标签;确定所述上一层次标签所对应的语义信息,并在所述多个标签数据中确定与所述上一层次标签所对应的语义信息相匹配的语义信息;将所述目标数据标签所对应的语义信息作为新语义信息,添加至所述相匹配的语义信息的下一层次的语义信息中,并基于所述预设数据集合为所述新语义信息确定相匹配数据。In a possible implementation manner, the determining unit module is further configured to: in the case that the target data tag is determined in the preset data set, determine based on the hierarchical information between the data tags in the preset data set The upper-level label of the target data label; the target data label is a data label that does not contain a corresponding object label in the object labels of multiple label data; determine the semantic information corresponding to the upper-level label, and Determining the semantic information that matches the semantic information corresponding to the upper-level label among the plurality of label data; adding the semantic information corresponding to the target data label as new semantic information to the matched semantic information In the semantic information of the next level of the information, and based on the preset data set, matching data is determined for the new semantic information.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.
对应于图1中的数据集的确定方法,本公开实施例还提供了一种计算机设备500,如图5所示,为本公开实施例提供的计算机设备500结构示意图,包括:Corresponding to the determination method of the data set in FIG. 1, the embodiment of the present disclosure also provides a computer device 500, as shown in FIG. 5, which is a schematic structural diagram of the computer device 500 provided by the embodiment of the present disclosure, including:
处理器51、存储器52、和总线53;存储器52用于存储执行指令,包括内存521和外部存储器522;这里的内存521也称内存储器,用于暂时存放处理器51中的运算数据,以及与硬盘等外部存储器522交换的数据,处理器51通过内存521与外部存储器522进行数据交换,当所述计算机设备500运行时,所述处理器51与所述存储器52之间通过总线53通信,使得所述处理器51执行以下指令: Processor 51, memory 52, and bus 53; memory 52 is used for storing and executing instruction, comprises memory 521 and external memory 522; memory 521 here is also called internal memory, is used for temporarily storing computing data in processor 51, and The data exchanged by the external memory 522 such as hard disk, the processor 51 exchanges data with the external memory 522 through the memory 521, and when the computer device 500 is running, the processor 51 communicates with the memory 52 through the bus 53, so that The processor 51 executes the following instructions:
获取包含多个语义信息的语义数据库;Obtain a semantic database containing multiple semantic information;
基于所述语义数据库创建多个标签数据;一个标签数据对应一个语义类别,所述标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别;Create a plurality of tag data based on the semantic database; a tag data corresponds to a semantic category, and the tag data includes object tags belonging to the corresponding semantic category; The orientation indicates the category of the test;
基于预设数据集合,为至少部分(例如各个)所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分(例如每个)所述标签数据分别对应的测试数据集合,得到多个测试数据集合。Based on the preset data set, matching data is determined for at least part (for example, each) of the object tags of the tag data, and a test data set corresponding to at least part (for example, each) of the tag data is determined based on the matching data. , to get multiple test data sets.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据集的确定方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method for determining a data set described in the above-mentioned method embodiments are executed . Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据集的确定方法的步骤,具体可参见上述方法实施例,在此不再赘述。An embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the method for determining the data set described in the above method embodiment, for details, please refer to The foregoing method embodiments are not described in detail here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit, SDK)等等。Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described system and device can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims (13)

  1. 一种数据集的确定方法,其特征在于,包括:A method for determining a data set, comprising:
    获取包含多个语义信息的语义数据库;Obtain a semantic database containing multiple semantic information;
    基于所述语义数据库创建多个标签数据;一个标签数据对应一个语义类别,所述标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别;Create a plurality of tag data based on the semantic database; a tag data corresponds to a semantic category, and the tag data includes object tags belonging to the corresponding semantic category; The orientation indicates the category of the test;
    基于预设数据集合,为至少部分所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分所述标签数据分别对应的测试数据集合,得到多个测试数据集合。Determining matching data for object tags of at least part of the tag data based on the preset data set, and determining test data sets corresponding to at least part of the tag data based on the matching data, to obtain multiple test data sets.
  2. 根据权利要求1所述的方法,其特征在于,所述语义数据库为多个,所述基于所述语义数据库创建多个标签数据,包括:The method according to claim 1, wherein there are multiple semantic databases, and creating a plurality of tag data based on the semantic databases includes:
    将多个语义数据库中的语义信息进行融合,得到融合语义数据库;其中,所述融合语义数据库中包含多个融合语义信息和多个融合语义信息之间的层次信息;Fusing semantic information in multiple semantic databases to obtain a fusion semantic database; wherein, the fusion semantic database includes multiple fusion semantic information and hierarchical information between multiple fusion semantic information;
    确定待划分的多个语义类别,并按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据。A plurality of semantic categories to be divided is determined, and the fusion semantic database is divided into the plurality of label data according to the plurality of semantic categories.
  3. 根据权利要求2所述的方法,其特征在于,所述将多个语义数据库中的语义信息进行融合,得到融合语义数据库,包括:The method according to claim 2, wherein said merging the semantic information in a plurality of semantic databases to obtain the fusion semantic database comprises:
    在所述多个语义数据库的第一语义数据库中确定待融合语义信息;所述待融合语义信息在所述第一语义数据库中不包含下一层级的语义信息;Determining the semantic information to be fused in the first semantic database of the multiple semantic databases; the semantic information to be fused does not include semantic information of the next level in the first semantic database;
    基于所述第一语义数据库中语义信息间的层次信息,确定所述待融合语义信息所在的语义路径,所述语义路径包含至少一个语义信息;Determine the semantic path where the semantic information to be fused is located based on the hierarchical information among the semantic information in the first semantic database, where the semantic path includes at least one semantic information;
    基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,所述第二语义数据库为所述多个语义数据库中除所述第一语义数据库之外的数据库。Based on the high-level semantic information in the semantic path before the semantic information to be fused, fuse the semantic information to be fused with the semantic information in the second semantic database to obtain the fused semantic database, the second The semantic database is a database other than the first semantic database among the plurality of semantic databases.
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述语义路径中位于所述待融合语义信息之前的高层次语义信息,将所述待融合语义信息和第二语义数据库中的语义信息进行融合,得到所述融合语义数据库,包括:The method according to claim 3, characterized in that, based on the high-level semantic information in the semantic path before the semantic information to be fused, the semantic information to be fused and the semantic information in the second semantic database The information is fused to obtain the fusion semantic database, including:
    按照由高到低的层次顺序,在所述高层次语义信息中确定目标语义信息;所述目标语义信息在所述第二语义数据库中包含相对应的语义信息;Determining target semantic information in the high-level semantic information according to the hierarchical order from high to low; the target semantic information includes corresponding semantic information in the second semantic database;
    将所述待融合语义信息和所述第二语义数据库中与所述目标语义信息相对应的语义信息的下一层次的语义信息进行融合,得到所述融合语义数据库。Fusing the semantic information to be fused with the semantic information of the next level of the semantic information corresponding to the target semantic information in the second semantic database to obtain the fused semantic database.
  5. 根据权利要求2所述的方法,其特征在于,所述融合语义数据库为树形结构的数据库;所述按照所述多个语义类别对所述融合语义数据库划分为所述多个标签数据,包括:The method according to claim 2, wherein the fusion semantic database is a tree-structured database; the division of the fusion semantic database into the plurality of label data according to the plurality of semantic categories includes :
    在所述树形结构的数据库中确定与至少部分语义类别相对应的节点,得到多个目标 节点;Determining nodes corresponding to at least part of the semantic categories in the database of the tree structure to obtain a plurality of target nodes;
    将所述目标节点作为根节点,对所述树形结构的数据库进行划分,划分得到多个子树形结构的数据库,其中,一个子树形结构的数据库对应一个目标节点;Using the target node as a root node, dividing the tree-structured database to obtain a plurality of sub-tree-structured databases, wherein one sub-tree-structured database corresponds to one target node;
    基于所述多个子树形结构的数据库确定所述多个标签数据,其中,所述标签数据中的物体标签为对应子树形结构的数据库中的语义信息。The plurality of tag data is determined based on the plurality of sub-tree structured databases, wherein the object tags in the tag data are semantic information in the corresponding sub-tree structured databases.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述预设数据集合中包含多个数据和多个数据的数据标签;The method according to any one of claims 1 to 5, wherein the preset data set includes a plurality of data and data tags of the plurality of data;
    所述基于预设数据集合,为至少部分所述标签数据的物体标签确定相匹配数据,包括:The determining matching data for at least part of the object tags of the tag data based on the preset data set includes:
    确定所述标签数据中所包含的物体标签;determining object tags included in the tag data;
    将所述预设数据集合中的数据标签与所述物体标签进行匹配,确定至少一组匹配标签;matching the data tags in the preset data set with the object tags to determine at least one set of matching tags;
    在所述预设数据集合中确定与至少一组匹配标签中的数据标签相对应的至少一个数据,并将所述相对应的至少一个数据确定为与该组匹配标签中的物体标签相匹配的数据。Determining at least one data corresponding to a data tag in at least one group of matching tags in the preset data set, and determining the corresponding at least one data as matching an object tag in the group of matching tags data.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, further comprising:
    通过至少一个测试数据集合分别对所述待测试模型进行测试处理,得到多个测试结果;performing test processing on the model to be tested by using at least one test data set to obtain multiple test results;
    计算所述多个测试结果的平均值,并将所述平均值确定为对所述待测试模型进行全方位表示测试的测试结果。calculating an average value of the plurality of test results, and determining the average value as a test result of performing an all-round representation test on the model to be tested.
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, further comprising:
    在所述预设数据集合中未确定出与目标标签数据中的目标物体标签相匹配数据的情况下,确定所述目标标签数据所对应目标语义类别;In the case where no data matching the target object tag in the target tag data is determined in the preset data set, determine the target semantic category corresponding to the target tag data;
    在备选数据库中查找与所述目标语义类别相匹配的匹配数据库,并在所述匹配数据库中查找与所述目标物体标签相匹配数据。A matching database matching the target semantic category is searched in the candidate database, and data matching the target object label is searched in the matching database.
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 8, further comprising:
    在所述预设数据集合中确定出目标数据标签的情况下,基于所述预设数据集合中数据标签之间的层次信息,确定所述目标数据标签的上一层次标签;所述目标数据标签为在多个标签数据的物体标签中不包含对应物体标签的数据标签;In the case where the target data tag is determined in the preset data set, based on the hierarchical information between the data tags in the preset data set, determine the upper layer tag of the target data tag; the target data tag The data label of the corresponding object label is not included in the object label of multiple label data;
    确定所述上一层次标签所对应的语义信息,并在所述多个标签数据中确定与所述上一层次标签所对应的语义信息相匹配的语义信息;determining the semantic information corresponding to the upper-level label, and determining the semantic information matching the semantic information corresponding to the upper-level label in the plurality of label data;
    将所述目标数据标签所对应的语义信息作为新语义信息,添加至所述相匹配的语义信息的下一层次的语义信息中,并基于所述预设数据集合为所述新语义信息确定相匹配数据。Add the semantic information corresponding to the target data tag as new semantic information to the semantic information of the next level of the matching semantic information, and determine the corresponding semantic information for the new semantic information based on the preset data set. match data.
  10. 一种数据集的确定装置,其特征在于,包括:A device for determining a data set, characterized in that it includes:
    获取单元,用于获取包含多个语义信息的语义数据库;an acquisition unit, configured to acquire a semantic database containing a plurality of semantic information;
    创建单元,用于基于所述语义数据库创建多个标签数据;一个标签数据对应一个语义类别,所述标签数据包含所属于对应语义类别的物体标签;所述多个标签数据对应的语义类别为能够对待测试模型进行全方位表示测试的类别;The creation unit is used to create a plurality of tag data based on the semantic database; one tag data corresponds to a semantic category, and the tag data includes object tags belonging to the corresponding semantic category; the semantic category corresponding to the plurality of tag data can be The category of the full representation test for the model to be tested;
    确定单元,用于基于预设数据集合,为至少部分所述标签数据的物体标签确定相匹配数据,并基于所述相匹配数据确定至少部分所述标签数据分别对应的测试数据集合,得到多个测试数据集合。The determining unit is configured to determine matching data for object tags of at least part of the tag data based on a preset data set, and determine test data sets corresponding to at least part of the tag data based on the matching data, to obtain a plurality of A collection of test data.
  11. 一种计算机设备,其特征在于,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一所述的数据集的确定方法的步骤。A computer device, characterized in that it includes: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the connection between the processor and the memory communicate with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the method for determining the data set according to any one of claims 1 to 9 are executed.
  12. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任意一项所述的数据集的确定方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the method for determining a data set according to any one of claims 1 to 9 is executed A step of.
  13. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行用于实现权利要求1至9中的任一权利要求所述的数据集的确定方法的步骤。A computer program product, comprising computer-readable codes, or a computer-readable storage medium bearing computer-readable codes, when the computer-readable codes run in a processor of an electronic device, processing in the electronic device The device executes the steps for realizing the determination method of the data set according to any one of claims 1 to 9.
PCT/CN2022/079074 2021-08-26 2022-03-03 Data set determination method and apparatus, and computer device and storage medium WO2023024474A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110986886.1A CN113704519B (en) 2021-08-26 2021-08-26 Data set determining method and device, computer equipment and storage medium
CN202110986886.1 2021-08-26

Publications (1)

Publication Number Publication Date
WO2023024474A1 true WO2023024474A1 (en) 2023-03-02

Family

ID=78655041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/079074 WO2023024474A1 (en) 2021-08-26 2022-03-03 Data set determination method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN113704519B (en)
WO (1) WO2023024474A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704519B (en) * 2021-08-26 2024-04-12 北京市商汤科技开发有限公司 Data set determining method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof
CN108984618A (en) * 2018-06-13 2018-12-11 深圳市商汤科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN110162644A (en) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 A kind of image set method for building up, device and storage medium
US20200327193A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Displaying text classification anomalies predicted by a text classification model
CN112597135A (en) * 2021-01-04 2021-04-02 天冕信息技术(深圳)有限公司 User classification method and device, electronic equipment and readable storage medium
CN113704519A (en) * 2021-08-26 2021-11-26 北京市商汤科技开发有限公司 Data set determination method and device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8572086B2 (en) * 2009-01-21 2013-10-29 Telefonaktiebolaget Lm Ericsson (Publ) Generation of annotation tags based on multimodal metadata and structured semantic descriptors
CN105069483B (en) * 2015-08-21 2019-01-01 中国地质大学(武汉) The method that a kind of pair of categorized data set is tested
CN110147551B (en) * 2019-05-14 2023-07-11 腾讯科技(深圳)有限公司 Multi-category entity recognition model training, entity recognition method, server and terminal
CN111695052A (en) * 2020-06-12 2020-09-22 上海智臻智能网络科技股份有限公司 Label classification method, data processing device and readable storage medium
CN112035614B (en) * 2020-08-31 2023-11-10 康键信息技术(深圳)有限公司 Test set generation method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof
CN108984618A (en) * 2018-06-13 2018-12-11 深圳市商汤科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN110162644A (en) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 A kind of image set method for building up, device and storage medium
US20200327193A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Displaying text classification anomalies predicted by a text classification model
CN112597135A (en) * 2021-01-04 2021-04-02 天冕信息技术(深圳)有限公司 User classification method and device, electronic equipment and readable storage medium
CN113704519A (en) * 2021-08-26 2021-11-26 北京市商汤科技开发有限公司 Data set determination method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113704519A (en) 2021-11-26
CN113704519B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN110245496B (en) Source code vulnerability detection method and detector and training method and system thereof
CN110400169B (en) Information pushing method, device and equipment
US10347019B2 (en) Intelligent data munging
US9542477B2 (en) Method of automated discovery of topics relatedness
Chen et al. General functional matrix factorization using gradient boosting
US20160042276A1 (en) Method of automated discovery of new topics
CN109948710B (en) Micro-service identification method based on API similarity
US8661004B2 (en) Representing incomplete and uncertain information in graph data
CN111382283B (en) Resource category label labeling method and device, computer equipment and storage medium
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN109033277A (en) Class brain system, method, equipment and storage medium based on machine learning
CN108647800A (en) A kind of online social network user missing attribute forecast method based on node insertion
CN110765348B (en) Hot word recommendation method and device, electronic equipment and storage medium
US11836331B2 (en) Mathematical models of graphical user interfaces
KR20190094068A (en) Learning method of classifier for classifying behavior type of gamer in online game and apparatus comprising the classifier
CN110956271B (en) Multi-stage classification method and device for mass data
WO2023024474A1 (en) Data set determination method and apparatus, and computer device and storage medium
CN109635089B (en) Literature work novelty evaluation system and method based on semantic network
KR102098255B1 (en) System and method for consolidating knowledge based on knowledge embedding
CN116974554A (en) Code data processing method, apparatus, computer device and storage medium
CN108681490B (en) Vector processing method, device and equipment for RPC information
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN116306923A (en) Evaluation weight calculation method based on knowledge graph
CN115238092A (en) Entity relationship extraction method, device, equipment and storage medium
CN103955526A (en) Data storage method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22859830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE