CN116578867A

CN116578867A - Identification generation method and electronic equipment

Info

Publication number: CN116578867A
Application number: CN202310450602.6A
Authority: CN
Inventors: 胡子元; 方成方; 胡培钊; 周海波; 王士林; 李基�; 陆尤静
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-08-11

Abstract

The application provides a method for generating a mark and electronic equipment. The method comprises the following steps: firstly, acquiring media content; then, extracting the characteristics of the media content to obtain the digital content characteristics of the media content; then, carrying out quantization processing on the digital content characteristics to obtain first data; then, a digital content identification of the media content is determined based on the first data. In this way, the digital content identifier is generated according to the digital content characteristics extracted by the characteristics, so that the digital content identifier is associated with the media content, and the method is applicable to fuzzy comparison of copyright inquiry, metadata inquiry and the like and application scenes of similar inquiry. In addition, the quantization processing can reduce the data volume, and compared with the matching according to the digital content characteristics in the prior art in the application scene of fuzzy comparison and similar query such as copyright query, metadata query and the like, the matching according to the first data can reduce the matching complexity, the calculation cost and the time cost in the query process, and improve the query efficiency.

Description

Identification generation method and electronic equipment

Technical Field

The embodiment of the application relates to the field of data processing, in particular to an identification method and electronic equipment.

Background

Digital content identification is commonly used for uniquely identifying digital content and is widely used in file systems and databases; are often used in a variety of scenarios such as resource retrieval and copyright certification.

Currently, different digital content identifiers are allocated to different digital content according to a digital sequence by a centralized management mechanism; however, for similar digital content, the digital content identification may not be relevant, and thus, may not be suitable for some application scenarios of fuzzy comparison and similar queries.

Disclosure of Invention

In view of the above, the application provides a method for generating an identifier and an electronic device. The digital content identification generated by the method can be suitable for application scenes of fuzzy comparison and similar query such as copyright query, metadata query and the like.

In a first aspect, an embodiment of the present application provides a method for generating a identifier, where the method includes: firstly, acquiring media content; then, extracting the characteristics of the media content to obtain the digital content characteristics of the media content; then, carrying out quantization processing on the digital content characteristics to obtain first data; then, a digital content identification of the media content is determined based on the first data.

In this way, the digital content characteristics are generated through the characteristic extraction, and the digital content identification is generated according to the data content characteristics, so that the digital content identification is associated with the media content, and the method is further suitable for fuzzy comparison of copyright inquiry, metadata inquiry and the like and application scenes of similar inquiry.

Secondly, the quantization process can reduce the data amount, and the digital content identification is generated according to the digital content characteristics after the quantization process; in this way, the complexity of generating the digital content identification can be reduced; the data amount of the digital content identification can be reduced, so that the storage space occupied by the digital content identification is reduced.

In the application scenes of fuzzy comparison and similar query such as copyright query, metadata query and the like, compared with the prior art of matching according to the digital content characteristics, the matching is carried out according to the quantized digital content characteristics (namely the first data), so that the matching complexity, the calculation cost and the time cost in the query process can be reduced, and the query efficiency is improved.

For example, media content, i.e., media content, may also be referred to as Digital content (Digital content); digital content is content of different content types, such as text, images, sound, etc., that exist in digital form.

Illustratively, the digital content feature may be a multi-dimensional vector.

Illustratively, the digital content identification may be a digital object identifier (Digital Objects Identifier, DOI). A digital object identifier, which is a set of strings of numbers, letters, or other symbols, is a device that identifies a digitized content resource.

Illustratively, in some scenarios, the quantization process may be referred to as a quantization compression process (Quantization and compression).

According to a first aspect, the first data is indicative of an index of one or more first cluster centers, the one or more first cluster centers being one or more of a plurality of cluster centers, the plurality of cluster centers being used for classification of the digital content features.

Illustratively, a plurality of cluster centers are used for classifying the digital content features, the plurality of cluster centers may be obtained through pre-training, and the indexes of the plurality of cluster centers may also be obtained through pre-training. A cluster center may be used to determine a feature type.

For example, the quantization process may include classifying the digital content features by a trained feature classification model to obtain a first cluster center. The plurality of cluster centers are obtained by training a feature classification model.

For example, the index may be used to uniquely identify one cluster center, with different cluster centers having different indices.

For example, the index of the cluster center may be a binary string, that is, the quantization process is a binarization quantization process; thus, the obtained first data can be binary character strings, namely, the first data with smaller data quantity is obtained; furthermore, in the application scene of fuzzy comparison and similar query, the matching complexity can be further reduced, and the query efficiency is improved.

It should be understood that the index of the cluster center may also be a decimal string or a hexadecimal string, which the present application is not limited to.

According to a first aspect, or any implementation manner of the first aspect above, the one or more first cluster centers are one or more cluster centers of the digital content feature to a preset distance from the plurality of cluster centers.

That is, one or more cluster centers from among the plurality of cluster centers, which are a preset distance from the digital content feature, may be searched as one or more first cluster centers.

For example, the cluster center may be represented by a vector.

Illustratively, the dimension of the cluster center is the same as the dimension of the digital content feature.

By way of example, the distance of the digital content feature from the cluster center may be Euclidean distance, manhattan distance, etc., as the application is not limited in this regard.

It should be understood that the preset distance may be set as desired, and the present application is not limited in this regard.

According to a first aspect, or any implementation manner of the first aspect above, the one or more first cluster centers are one or more cluster centers closest or farthest from the digital content feature to the plurality of cluster centers.

The closer the distance between the digital content features and the clustering center is, the more accurate the classification of the digital content features is; further, when the cluster center closest to the first cluster center is selected, the index of the first cluster center is adopted subsequently, and the digital content characteristics can be accurately represented; furthermore, in the application scene of fuzzy comparison and similar query, the accuracy of the query can be improved.

According to the first aspect, or any implementation manner of the first aspect, the quantifying the digital content feature to obtain first data includes: classifying the digital content features to determine one or more first cluster centers; the first data is determined from the index of the one or more first cluster centers.

According to a first aspect, or any implementation manner of the first aspect, the plurality of first cluster centers are a plurality of cluster centers from a plurality of second data to a preset distance in the plurality of cluster centers, and the plurality of second data is data obtained by performing feature transformation according to the digital content features.

According to the first aspect, or any implementation manner of the first aspect, the quantifying the digital content feature to obtain first data includes: performing feature transformation on the digital content features to obtain a plurality of second data; classifying the second data, and determining a plurality of first clustering centers, wherein the first clustering centers are clustering centers with preset distances from the second data to the plurality of clustering centers; the first data is determined according to the indexes of the plurality of first cluster centers.

For example, the feature transformation may include feature partitioning, i.e., partitioning the first data into a plurality of second data.

By way of example, the feature transformation may include a homogenization treatment and feature segmentation. Wherein, the digital content feature can be homogenized to obtain a third intermediate feature; and then, carrying out feature division on the third intermediate features to obtain a plurality of second data. The homogenization treatment refers to that all elements in the digital content features are used for determining each element in the third intermediate feature, so that the concentration of the digital content features can be diffused, the energy distribution of the digital content features is uniform, and the subsequent classification of the digital content features is facilitated.

In this case, the dimension of the first clustering center is the same as that of the second data, and the second data is obtained by performing feature transformation on the digital content features, so that the data size of the second data is smaller than that of the first data; therefore, compared with the method of directly searching the first clustering center according to the first data, the method has the advantages that the dimension of the clustering center is lower, the classification calculated amount is small, and the classification efficiency is high; thereby improving the quantization processing efficiency and the efficiency of generating the digital content identification.

In addition, the lower the dimension of the cluster center is, the smaller the occupied space of the cluster center is; and further, the occupied space of the clustering center can be reduced.

And thirdly, selecting the first clustering center with smaller granularity, wherein the selected first clustering center is more accurate, and further the obtained first data is more accurate, so that the accuracy of query can be improved.

According to a first aspect, or any implementation of the first aspect above, the digital content is characterized by multi-dimensional data, the second data having dimensions smaller than dimensions of the digital content features.

According to a first aspect, or any implementation of the first aspect above, the indices of different first cluster centers are used to measure the similarity between different media content.

That is, indexes of the first clustering centers corresponding to the two media contents respectively can be adopted for matching in the subsequent query process, and the similarity between the two media contents can be calculated.

According to a first aspect, or any implementation manner of the first aspect, the feature extraction is performed on the media content to obtain digital content features of the media content, including: the media content is input to the first feature extraction model, and digital content features of the media content are output.

According to the first aspect, or any implementation manner of the first aspect, the scene type to which the media content belongs corresponds to a first feature extraction model, where the first feature extraction model is one or more of a plurality of feature extraction models, and the plurality of feature extraction models correspond to the plurality of scene types one by one.

Because of the first feature extraction model corresponding to the scene type to which the media content belongs, the feature extraction effect of the media content in the scene type to which the media content belongs is better compared with the feature extraction models corresponding to other scene types; therefore, the first feature extraction model corresponding to the scene type to which the media content belongs is adopted for feature extraction, so that the accuracy of the digital content features can be improved. In this way, the accuracy of the query can be improved.

According to the first aspect, or any implementation manner of the first aspect, the method further includes: the media content is classified, and the scene type of the media content is determined.

For example, the media content may be input to a classifier, outputting the scene type to which the media content belongs.

According to a first aspect, or any implementation manner of the first aspect, determining, according to the first data, a digital content identification of the media content includes: and encoding the first data to obtain the digital content identification of the media content.

According to a first aspect, or any implementation of the first aspect above, the media content comprises at least one of: text, video, graphics, audio, images, or 3D (three-dimensional) models.

The 3D model may be obtained by three-dimensional modeling, and may be a polygonal representation of a real-world entity or an imaginary object.

It should be understood that the media content may also include other content types of content, as the application is not limited in this regard.

According to a first aspect, or any implementation of the first aspect above, the digital content features comprise digital fingerprints.

For example, features that are output by feature extraction using a neural network model may be referred to as digital fingerprints.

Illustratively, a pHASH (perceptual hash) algorithm is also used to extract features of the media content, thereby obtaining digital content features of the media content.

According to a first aspect, or any implementation manner of the first aspect, the first feature extraction model is a trained neural network model, and the training set of the first feature extraction model includes: first training data, second training data, and third training data; the scene type of the first training data and the scene type of the third training data are the same as the scene type of the media content; the third training data is different from the first training data; the second training data is countermeasure data of the first training data. Therefore, the trained feature extraction model can extract accurate features for the countermeasure data of the media content.

The second training data is illustratively countermeasure data of the first training data, which is also referred to as data generated by performing countermeasure processing on the first training data.

For example, when the first training data is text, the countermeasure processing may include operations such as synonym conversion; the second training data can be obtained by performing synonym conversion on part of text in the first training data.

For example, when the first training data is an image, the countermeasure processing may include operations of noise addition, rotation, shearing, compression, and the like; the second training data can be obtained by performing operations such as noise adding, rotation, shearing or compression on the first training data. For example, the first training data may be input into Stirmark Benchmark (an image modification tool), and the modification operation (or countermeasure operation) provided by Stirmark Benchmark modifies the first training data (which may also be referred to as countermeasure processing) to obtain the second training data.

For example, when the first training data is video, the countermeasure processing may include operations of frame extraction, flipping, blurring, and the like; the second training data can be obtained by performing operations such as frame extraction, overturning, blurring and the like on the first training data.

Illustratively, when the first training data is audio, the countermeasure processing may include operations of down-converting, converting the audio encoding format, and the like; the second training data can be obtained by performing operations such as frequency reduction and audio coding format conversion on the first training data.

According to a first aspect, or any implementation manner of the first aspect, the data type of the digital content feature is floating point, and the first data is a binary string.

According to a first aspect, or any implementation of the first aspect above, the first data has a data volume smaller than a data volume of the digital content feature.

In a second aspect, an embodiment of the present application provides a query method, including: firstly, receiving query data, wherein the query data comprises first media content and/or digital content identification of the first media content, and metadata of the first media content is lost; next, determining second media content matching the first media content from the plurality of second media content according to the first data of the first media content and the first data of the plurality of second media content; the first data of the first media content is obtained by carrying out quantization processing on the digital content characteristics of the first media content or decoding the first digital content identifier, and the first data of the second media content is obtained by carrying out quantization processing on the digital content characteristics of the second media content or decoding the digital content identifier of the second media content; thereafter, metadata of the second media content matching the first media content is output.

In this way, the metadata of the first media content may be retrieved.

According to a second aspect, determining second media content matching the first media content from a plurality of second media content based on first data of the first media content and first data of the plurality of second media content, comprises: determining a hamming distance between the first data of the first media content and the first data of the plurality of second media content; and determining the second media content with the smallest corresponding Hamming distance as the second media content matched with the first media content.

Compared with the prior art that the matching is carried out by calculating the Euclidean distance of the floating point number, the method and the device calculate the Hamming distance of the binary character string, have low calculation complexity and can improve the query efficiency. And the first data has relevance with the media content, so that the metadata of the first media content can be accurately searched, and the user experience can be improved.

When the query data is the first media content, the first data in the second aspect and any implementation manner of the second aspect may be generated according to the method in the first aspect and any implementation manner of the first aspect.

In a third aspect, an embodiment of the present application provides a query method, where the method includes: firstly, receiving query data, wherein the query data comprises first media content and/or digital content identification of the first media content; then, determining the matching degree between the plurality of second digital contents and the first media contents according to the first data of the first media contents and the first data of the plurality of second media contents; the first data of the first media content is obtained by carrying out quantization processing on the digital content characteristics of the first media content or decoding the digital content identification of the first media content, and the first data of the second media content is obtained by carrying out quantization processing on the digital content characteristics of the second media content or decoding the digital content identification of the second media content; and then, outputting the first N pieces of second media content with the highest matching degree, wherein N is a positive integer.

In this way, the second media content which is the same as or similar to the first media content can be found, and copyright determination is facilitated.

According to a third aspect, determining a degree of matching between a plurality of second digital content and a first media content based on a first digital content identification of the first media content and a second digital content identification of the plurality of second media content, comprises: the Hamming distance between the first data of the plurality of pieces of second media content and the first data of the first media content is used as the matching degree between the plurality of pieces of second media content and the first media content. .

Compared with the prior art that the matching is carried out by calculating the Euclidean distance of the floating point number, the method and the device calculate the Hamming distance of the binary character string, have low calculation complexity and can improve the query efficiency. And the first data has relevance with the media content, so that the second media content matched with the first media content queried by the user can be accurately searched, and the user experience can be improved.

When the query data is the first media content, the first data in any implementation manner of the third aspect and the third aspect may be generated according to the method in any implementation manner of the first aspect and the first aspect.

In addition, the application can train the feature extraction model, and can comprise the following steps: acquiring a plurality of training sets; the training sets are in one-to-one correspondence with the scene types, and one training set comprises: the system comprises first training data, second training data and third training data, wherein the scene type of the first training data is the same as the scene type of the third training data, the third training data is different from the first training data, and the second training data is countermeasure data of the first training data; training a plurality of feature extraction models based on the plurality of training sets; the plurality of feature extraction models are in one-to-one correspondence with the plurality of scene types.

Specifically, using a first training set, the process of training a first feature extraction model may be as follows: inputting the first training set into a first feature extraction model to obtain first digital content features of first training data in the first training set, second digital content features of second training data in the first training set and third digital content features of third training data in the first training set; the first feature extraction model is trained based on a first similarity between the first digital content feature and the second digital content feature and a second similarity between the first digital content feature and the third digital content feature. In this way, the distance between the digital content features of the first training data and the digital content features of the second training data can be shortened, and the distance between the digital content features of the first training data and the digital content features of the third training data can be shortened; in addition, in the using process, the first characteristic extraction model aims at the media content and the countermeasure data of the media content, and the characteristics of the output digital content are close, so that the digital content identifier generated by the identifier generating mode is applicable to fuzzy comparison of copyright inquiry, metadata inquiry and other application scenes of similar inquiry.

In addition, the application can adopt the quantization processing model to carry out quantization processing, and further can train the quantization processing model in advance, and can comprise the following steps: s digital content characteristics are obtained by extracting the characteristics of the data; inputting the S digital content features into a quantization processing model, and clustering the S digital content features by the quantization processing model to determine M clustering centers; the quantization processing model is trained based on the M cluster centers and the index of the M cluster centers.

Specifically, the quantization processing model can be trained by constraining the distance between the index of the ith cluster center and the index of the jth cluster center to be proportional to the distance between the ith cluster center and the jth cluster center, and constraining the distance between the p-th cluster center and the digital content feature belonging to the p-th cluster center; wherein i, j and p are integers between 1 and M. Wherein training the quantization processing model comprises: m cluster centers and indexes of the M cluster centers are trained. In this way, the index of the first clustering center can be used for accurately representing the digital content characteristics in the process of generating the identification; thereby improving the accuracy of the query.

The embodiment of the application also provides an identifier generating device, which can comprise:

the acquisition module is used for acquiring the media content;

the feature extraction module is used for extracting the features of the media content to obtain the digital content features of the media content;

the quantization processing module is used for carrying out quantization processing on the digital content characteristics to obtain first data;

the identification generation module is used for determining the digital content identification of the media content according to the first data.

It should be appreciated that the identification generating means may also perform the steps of any of the implementations of the first aspect and the first aspect described above.

Exemplary, the embodiment of the present application further provides a query device, where the query device may include:

the data receiving module is used for receiving query data, wherein the query data comprises first media content and/or digital content identification of the first media content, and metadata of the first media content is lost;

a matching module for determining second media content matching the first media content from the plurality of second media content according to the first data of the first media content and the first data of the plurality of second media content; the first data of the first media content is obtained by carrying out quantization processing on the digital content characteristics of the first media content or decoding the first digital content identifier, and the first data of the second media content is obtained by carrying out quantization processing on the digital content characteristics of the second media content or decoding the digital content identifier of the second media content;

And the output module is used for outputting the metadata of the second media content matched with the first media content.

It should be appreciated that the querying device may also perform the steps of the second aspect and any implementation manner of the second aspect.

the matching module is used for determining the matching degree between the plurality of second digital contents and the first media contents according to the first data of the first media contents and the first data of the plurality of second media contents; the first data of the first media content is obtained by carrying out quantization processing on the digital content characteristics of the first media content or decoding the digital content identification of the first media content, and the first data of the second media content is obtained by carrying out quantization processing on the digital content characteristics of the second media content or decoding the digital content identification of the second media content;

and the output module is used for outputting the first N pieces of second media content with the highest matching degree, wherein N is a positive integer.

It should be appreciated that the querying device may also perform the steps in any implementation manner of the third aspect and the third aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor, the memory coupled to the processor; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the method of generating the identification of the first aspect or any possible implementation of the first aspect.

Any implementation manner of the fourth aspect and any implementation manner of the fourth aspect corresponds to any implementation manner of the first aspect and any implementation manner of the first aspect, respectively. Technical effects corresponding to any implementation manner of the fourth aspect may be referred to the technical effects corresponding to any implementation manner of the first aspect, and are not described herein.

In a fifth aspect, embodiments of the present application provide a chip comprising one or more interface circuits and one or more processors; the one or more processors receive or transmit data via the one or more interface circuits, which when executed by the one or more processors cause the electronic device to perform the method of generating the identification in the first aspect or any possible implementation of the first aspect.

Any implementation manner of the fifth aspect and any implementation manner of the fifth aspect corresponds to any implementation manner of the first aspect and any implementation manner of the first aspect, respectively. Technical effects corresponding to any implementation manner of the fifth aspect may be referred to the technical effects corresponding to any implementation manner of the first aspect, and are not described herein.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when run on a computer or processor causes the computer or processor to perform the method of generating an identification in the first aspect or any possible implementation manner of the first aspect.

Any implementation manner of the sixth aspect corresponds to any implementation manner of the first aspect and the first aspect, respectively. Technical effects corresponding to any implementation manner of the sixth aspect may be referred to the technical effects corresponding to any implementation manner of the first aspect, and are not described herein.

In a seventh aspect, embodiments of the present application provide a computer program product comprising computer instructions which, when executed by a computer or processor, cause the computer or processor to perform the method of generating an identification in the first aspect or any possible implementation of the first aspect.

Any implementation manner of the seventh aspect and any implementation manner of the seventh aspect corresponds to any implementation manner of the first aspect and any implementation manner of the first aspect, respectively. Technical effects corresponding to any implementation manner of the seventh aspect may be referred to the technical effects corresponding to any implementation manner of the first aspect, and are not described herein.

Drawings

FIG. 1a is a schematic diagram of an exemplary application scenario;

FIG. 1b is a schematic diagram of an exemplary application scenario;

FIG. 2 is a schematic diagram of an exemplary illustrated identity generation process;

FIG. 3a is a schematic diagram of a training process for an exemplary feature extraction model;

FIG. 3b is a model training schematic diagram shown schematically;

FIG. 3c is a model training schematic diagram shown schematically;

FIG. 3d is a schematic diagram of an exemplary training process;

FIG. 4a is a schematic diagram of a training process for an exemplary illustrated quantization process model;

FIG. 4b is a schematic diagram of an exemplary training process;

FIG. 5a is a schematic diagram of an exemplary illustrated identity generation process;

FIG. 5b is a schematic diagram of an exemplary quantization process;

FIG. 6a is a schematic diagram of an exemplary illustrated identity generation process;

FIG. 6b is a schematic diagram of an exemplary quantization process;

FIG. 7 is a schematic diagram of an exemplary query process;

FIG. 8 is a schematic diagram of an exemplary query process;

FIG. 9 is a schematic diagram of an exemplary illustrated identity generation device;

fig. 10 is a schematic view of the structure of the device shown in an exemplary manner.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms first and second and the like in the description and in the claims of embodiments of the application, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.

For ease of understanding, some of the terms mentioned in the embodiments of the present application will be first described.

1. Media content (Media content)

Media content, which may also be referred to as Digital content (Digital content). Digital content, which is content of different content types, such as text, images, sound, etc., in digital form, may be stored on a digital carrier, such as an optical disc, hard disk, etc., and transmitted via a network, etc. The digital content is the whole of products or services which integrate and apply the contents such as images, characters, video and audio through digital technology, and is the product of combining digital media technology and cultural creative.

Digital technology (Digital Technology), which is a scientific technology accompanied by an electronic computer, refers to the use of a certain device to carry out various information, including: the technique of converting the figure, text, sound, image, etc. into binary digits "0" and "1" which can be recognized by an electronic computer, and then performing operation, processing, storage, transmission, propagation and restoration. Since information is encoded, compressed, decoded, etc. by a computer in links such as operation and storage, it is also called digital technology, computer digital technology, etc. Digital technology is also known as digital control technology.

2. Digital content identification

The digital content identification may be a digital object identifier (Digital Objects unique Identifier, DOI). A digital object identifier is a set of strings of numbers, letters, or other symbols that identify digitized content resources. It can be assigned to any digital entity using the network. The DOI is unique as an identifier of a digitized object, and once generated, does not change, and does not change with the change of attributes such as the copyright owner or the storage address of the identified digitized object. In the traditional entity publications, books, journals, magnetic tapes, optical discs and other publications are marked by international standards such as ISBN (International Standard BookNumber ), ISSN (International Standard Serial Number, international standard publication number), ISRC (International Standard Recording Code, international standard audio/video data code) and the like and are pasted on a physical cover in the form of bar codes to serve as unique marks of the publications.

The application of digital content identification is described below.

Fig. 1a is a schematic diagram of an exemplary application scenario. In the embodiment of fig. 1a, an application scenario of a copyright query/copyright decision is shown.

For example, when a user needs to determine whether a copyright conflict exists in a certain media content (such as an image, audio, video, text, graphics, etc.), a media content query platform client (such as an application program, an applet, a web page, etc.) in the terminal device may be opened for query.

Referring to fig. 1a (1), the main interface 101 of the media content query platform client may include one or more controls including, but not limited to: search boxes, search buttons, metadata options, genre options (e.g., audio options, image/video options, graphical options), and the like, as the application is not limited in this regard.

For example, when the media content is text, the user may enter the media content or a digital content identification of the media content in a search box. When the media content is audio, the user can click on an audio option to input the media content; and/or the user may enter a digital content identification of the media content in a search box. When the media content is an image/video, the user can click on the image/video option to input the media content; and/or the user may enter a digital content identification of the media content in a search box. When the digital content is graphic, the user can click on the graphic option to input the media content; and/or the user may enter a digital content identification of the media content in a search box. Wherein, for convenience of description, the media content input by the user is referred to as first media content.

After the user inputs the first media content and/or the digital content identifier of the first media content, the user can click a search button; correspondingly, the media content query platform client can respond to the operation behaviors of the user and send the first media content or the digital content identification of the first media content to the media content query platform server. The media content query platform server may then compare the digital content identifier of the first media content queried by the user with the digital content identifiers of the media content (hereinafter referred to as second media content) pre-stored in the database to find second media content matching the first media content queried by the user (i.e., to find second media content identical or similar to the first media content queried by the user) from the database. When the server side of the digital content query platform finds the second media content matched with the first media content queried by the user, the second media content can be returned to the client side of the media content query platform. The media content query platform client may display second media content matching the first media content queried by the user at the search results interface 102, as shown in fig. 1a (2). When the media content query platform server side does not find the second media content matched with the first media content queried by the user, a query result of query failure can be returned; at this time, the media content query platform client may display a prompt for a query failure at the search results interface 102. In this way, the user can determine whether the queried media content has a copyright conflict.

Fig. 1b is a schematic view of an exemplary application scenario. In the embodiment of fig. 1b, an application scenario of a metadata query is shown. Among other things, metadata may refer to data describing data attributes, e.g., metadata of an image may include size, storage location, format, etc. of the image.

For example, after metadata of a certain media content is lost, a user may open a media content query platform client (such as an application program, an applet, a webpage, etc.) in the terminal device for retrieving; the main interface 101 of the media content query platform client may refer to the above description, and will not be described herein.

Illustratively, after the user inputs the first media content and/or the digital content identification of the first media content, the user may click on the metadata option, as shown in fig. 1b (1). Correspondingly, the media content query platform client can respond to the operation behaviors of the user and send the first media content or the digital content identification of the first media content to the media content query platform server. And then, the media content query platform server can compare the digital content identifier of the first media content queried by the user with the digital content identifier of the second media content pre-stored in the database to search the second media content matched with the first media content queried by the user from the database. After the media content query platform server finds the second media content matched with the first media content queried by the user, metadata of the second media content matched with the first media content queried by the user can be returned to the media content query platform client. The media content query platform client may display metadata for the second media content matching the first media content queried by the user in the search results interface 103, as shown in fig. 1b (2). When the media content query platform server side does not find the second media content matched with the first media content queried by the user, a query result of query failure can be returned; at this time, the media content query platform client may display a prompt for a query failure on the search results interface 103. In this way, the user can retrieve the corresponding metadata for the media content for which the metadata was lost.

It should be understood that the media content query platforms of fig. 1a and 1b may also be referred to by other names, and the application is not limited in this regard.

It should be understood that the present application may also be applied to other fuzzy alignments and application scenarios of similar queries, as the present application is not limited in this regard.

In the query process of fig. 1a and 1b, the digital content identification of the media content is one of the key factors of whether the second media content similar or identical to the first media content queried by the user can be found; therefore, the application provides a method for generating the identifier, which can generate the digital content identifier of the application scene suitable for fuzzy comparison of copyright inquiry/copyright judgment, metadata inquiry and the like and similar inquiry.

Fig. 2 is a schematic diagram of an exemplary illustrated identification generation process.

In the embodiment of fig. 2, the present application may generate a corresponding digital content identification for at least one of text, audio, image, video, 3D model or graphic media content, as shown in fig. 2 (1); the identity generation process may refer to fig. 2 (2).

It should be understood that the media content of the present application may include other content types of content, and the present application is not limited in this regard.

S201, acquiring media content.

Illustratively, obtaining media content for which a digital content identification needs to be generated; wherein the media content may include at least one of: text, audio, image, video or graphics.

S202, extracting the characteristics of the media content to obtain the digital content characteristics of the media content.

In one possible approach, the media content may be input into a feature extraction model, the feature extraction model performs feature extraction on the media content, and digital content features of the media content are output. In this case, the digital content feature may be a digital fingerprint.

By way of example, the feature extraction model may be implemented using a neural network, e.g., the feature extraction model may be implemented based on a convolutional neural network (Convolutional Neural Networks, CNN). The CNN may refer to a neural network including a convolutional layer, where the neural network may further include an active layer (e.g., reLU, prilu, etc.), a pooling layer (pooling layer), a batch normalization layer (BN layer), a full connection layer (Fully Connected layer), etc. network layers. Typical convolutional neural networks such as LeNet, alexNet, VGGNet, resNet, etc. By way of example, a basic CNN may be composed of a backbone network (backbone network) and a head network (head network); complex CNNs are composed of a backhaul network, a neck network (neck network), and a head network (head network); the present application is not limited to the networks constituting the CNN.

In one possible approach, a pHASH (perceptual hash) algorithm may be used to extract features of the media content to obtain digital content features of the media content.

It should be understood that the present application is not limited to the manner in which the feature extraction is performed on the media content, and the present application is described taking as an example the digital content feature of the output media content by inputting the media content into the feature extraction model.

The data type of the digital content feature may be, for example, a floating point type (float) or an integer type (int), which the present application is not limited to.

S203, carrying out quantization processing on the digital content characteristics to obtain first data.

Next, compressing the digital content features by performing quantization processing on the digital content features to obtain first data; in this way, the data amount of the obtained first data is smaller than the data amount of the digital content feature.

Illustratively, the quantization process may refer to searching for a first cluster center from among a plurality of cluster centers (the cluster centers may be identified by vectors) that is a preset distance from the digital content feature, and characterizing the digital content feature by using an index of the first cluster center. That is, the first data may be used to indicate an index of the first cluster center. The method comprises the steps that a plurality of clustering centers are used for classifying digital content characteristics, the plurality of clustering centers can be obtained through pre-training, and indexes of the plurality of clustering centers can also be obtained through pre-training; the specific training process will be described later.

For example, one or more first cluster centers from among the plurality of cluster centers, which are a preset distance from the digital content feature, may be searched; correspondingly, the first data may be used to indicate an index of one or more first cluster centers.

For example, the index of the cluster center may be a binary string; thus, the resulting first data may be a binary string. It should be understood that the index of the cluster center may be a decimal string or a hexadecimal string, which is not limited in the present application; the present application is described by taking an index as a binary string as an example.

For example, the digital content feature may be input to a quantization processing model, outputting the first data. The quantization processing model may include a plurality of cluster centers, and the quantization processing model searches for a first cluster center from among the plurality of cluster centers, which is a preset distance from the digital content feature, and outputs an index of the first cluster center. The plurality of cluster centers and the index of the plurality of cluster centers may be trained by training a quantization processing model, and a specific training process will be described later.

S204, determining the digital content identification of the media content according to the first data.

Illustratively, the digital content identification of the media content may be obtained by encoding the first data. For example, the digital content is identified as: ciHZ8av44Z5mc, CYUuraNV8atfk, CRUtHn1RdJxnC, CDK1EignVFTdg, and the like.

First, a training process of the feature extraction model will be described.

Fig. 3a is a schematic diagram of a training process of an exemplary feature extraction model.

S301, acquiring a plurality of training sets; the training sets are in one-to-one correspondence with the scene types, and one training set comprises: the system comprises first training data, second training data and third training data, wherein the scene type of the first training data is the same as the scene type of the third training data, the third training data is different from the first training data, and the second training data is countermeasure data of the first training data.

For example, a training set may be generated based on the public data set. The public data set may include a plurality of pieces of data, and each piece of data is a piece of media content.

In one possible manner, the scene type to which each piece of data in the public data set belongs may be determined according to the tag (such as a type tag) of each piece of data in the public data set.

In one possible manner, a classifier may be used to classify each piece of data in the public data set, and determine a scene type to which each piece of data in the public data set belongs.

For example, the data of different content types, the corresponding scene types may be different. For example, the scene types corresponding to the text may include: literature classes, biology classes, electronics classes, medicine classes, chemistry classes, and the like. For example, the scene types corresponding to the image may include: people, landscapes, things, animals, buildings, etc. For example, the scene types corresponding to the video may include: literature, action, love, etc. For example, the scene types to which the audio corresponds may include: popular music, operas, rock, ballad, etc.

Then, selecting one piece of data from a plurality of pieces of data corresponding to one scene type as first training data; then, the first training data can be subjected to countermeasure processing or countermeasure transformation to obtain second training data; then, selecting data except the first training data from a plurality of pieces of data corresponding to the scene type as third training data; in this way, a triplet of data consisting of the first training data, the second training data and the third training data can be obtained. And so on, a plurality of pieces of first training data can be selected from a plurality of pieces of data corresponding to the scene type, and a plurality of ternary data sets are generated according to the mode; wherein the plurality of triples may form a training set, the training set corresponding to the scene type. Further, multiple training sets may be generated based on data corresponding to multiple scene types; the plurality of training sets corresponds one-to-one to a plurality of scene types.

For example, in some scenarios, the first training data may be referred to as an anchor, the second training data may be referred to as a positive, and the third training data may be referred to as a negative; that is, a triplet may be represented as (anchor, active, negative).

For example, assuming that the public dataset includes 10000 pieces of data corresponding to the scene type a, 1000 pieces of first training data and 1000 pieces of third training data may be selected from the 10000 pieces of data. If the countermeasure processing is performed on one piece of first training data in different modes to obtain 108 pieces of second training data, the countermeasure processing is performed on 1000 pieces of first training data to obtain 108000 pieces of second training data; in this way, 108000 triples may be generated, and the training set for scene type a may include 108000 triples.

S302, training a plurality of feature extraction models based on a plurality of training sets; the plurality of feature extraction models are in one-to-one correspondence with the plurality of scene types.

For example, a neural network model can be selected for a scene type as a feature extraction model corresponding to the scene type; and then training the feature extraction model corresponding to the scene type by adopting the training set corresponding to the scene type.

Fig. 3b is a model training schematic diagram schematically shown. Referring to fig. 3b, assume that the training set is R (R is a positive integer) number (e.g., training set 1, training set 2 once again, training set R), then R feature extraction models may be trained (e.g., a feature extraction model 1, a feature extraction model 2 the feature extraction model R. The feature extraction model 1 is obtained by training the training set 1, the feature extraction model 2 is obtained by training the training set 2, and the feature extraction model R is obtained by training the training set R.

For example, a neural network model with better feature extraction performance for the scene type corresponding data can be selected from the prior art neural network models, and the neural network model is used as a feature extraction model corresponding to the scene type. For example, resNet50 feature extraction performs well for people in images; then the res net50 can be used as a feature extraction model for the character class. That is, a training set corresponding to a scene type is used to train a feature extraction model corresponding to the scene type.

In the following, a feature extraction model (hereinafter referred to as a first feature extraction model) corresponding to a scene type (hereinafter referred to as a first scene type) to which a first training set belongs is described by taking one training set (hereinafter referred to as a first training set) of a plurality of training sets as an example. Reference may be made to the following S3021 to S3022.

For example, the triples in the first training set may be divided into N1 (N1 is a positive integer) batches (batches); selecting N2 (N2 is a positive integer) batches from the N1 batches as a first verification set, and selecting N3 (N3 is a positive integer) batches from the other (N1-N2) batches as a first test set; at this point, the first training set includes (N1-N2-N3) batches. Then, training the first feature extraction model by adopting batches in a first training set, after training the first feature extraction model by adopting each batch, verifying the first feature extraction model by adopting batches in a first verification set, and selecting an optimal first feature extraction model; finally, testing the optimal first feature extraction model by adopting the batch in the first test set; as shown in fig. 3 c.

For example, the first training set corresponding to the scene type a includes 108000 triples, and 108 triples generated by performing the countermeasure processing on one piece of first training data may be referred to as a family (i.e., the triples belonging to the same family include the same first training data); thus, the first training set corresponding to scene type a includes 108000 triples, which may correspond to 1000 families, i.e., the first training set includes 1000 families. Next, the first training set may include 1000 families randomly shuffled in a family unit (i.e., data of other families are not interspersed in each family), and each 4 families may be taken as one batch (batch), and the first training set may be divided into 250 batches. Thereafter, 25 batches from the 250 batches may be selected as the first validation set, and 25 batches from the other 225 batches may be selected as the first test set; thus, the first training set comprises 200 batches. It should be understood that the number of batches in the first training set, the first validation set, and the first test set may be set as desired, as the application is not limited in this regard.

S3021, inputting the first training set into the first feature extraction model to obtain a first digital content feature of first training data in the first training set, a second digital content feature of second training data in the first training set, and a third digital content feature of third training data in the first training set.

S3022, training the first feature extraction model based on a first similarity between the first digital content feature and the second digital content feature, and a second similarity between the first digital content feature and the third digital content feature.

For example, the first feature extraction model may be trained one batch at a time in the first training set. The training process of the first feature extraction model will be described below using one batch in the first training set as an example.

For example, one batch of the first training set may be input to the first feature extraction model, and the first digital content features of the first training data in each triplet of the batch, the second digital content features of the second training data in each triplet of the batch, and the third digital content features of the third training data in each triplet of the batch may be extracted by the first feature extraction model, respectively.

Then, for each triplet, a first similarity between a first digital content feature of the first training data in the triplet and a second digital content feature of the second training data in the triplet may be calculated. And calculating a second similarity between the first digital content characteristic of the first training data in the triplet and the third digital content characteristic of the third training data in the triplet. Wherein the first similarity and the second similarity may be euclidean distance, manhattan distance, etc., which the present application is not limited to.

Then, the first similarity corresponding to all the ternary data sets in the batch can be added to obtain a first similarity sum value; and adding the second similarity corresponding to all the ternary data sets in the batch to obtain a second similarity sum value. The first feature extraction model is then trained with the goal of minimizing the first similarity and value and maximizing the second similarity and value. That is, the final goal is to pull the distance between the digital content features of the first training data and the digital content features of the second training data closer, and to pull the distance between the digital content features of the first training data and the digital content features of the third training data farther.

In one possible implementation, the value of the loss function is calculated using the loss function of equation (1) below; the Loss value is then back-propagated using a gradient descent mechanism based on the value of the Loss function, tending to 0, to achieve the goal of minimizing the first similarity and value and maximizing the second similarity and value.

Loss ＝ Max(dis(a, p)-dis(a, n) + margin, 0) (1)

Wherein, margin is an interval parameter, which can be set to 1.0; and in particular may be set as desired, as the application is not limited in this regard. dis (a, p) is a first similarity sum, and dis (a, n) is a second similarity sum.

Fig. 3d is a schematic diagram of an exemplary training process. One triplet of data is illustrated in fig. 3d as an example. After the triplet (active, negative) is input to the first feature extraction model, the first feature extraction model inputs an active feature (i.e., the first digital content feature of the first training data), a positive feature (i.e., the second digital content feature of the second training data), and a negative feature (i.e., the third digital content feature of the third training data). Then, the Euclidean distance (a, p)) between the anchor and the posivefeature can be calculated, and the Euclidean distance (a, n)) between the anchor and the posivefeature can be calculated. Then, a loss function value may be calculated based on the above formula (1), distance (a, p), and distance (a, n), and back-propagation may be performed according to the loss function value.

Furthermore, a learning rate may be set, for example, as Adam (1 e-4); and in particular may be set as desired, as the application is not limited in this regard. It should be noted that the purpose of setting the learning rate and margin is to make the feature extraction model converge in a faster time: the proper margin can improve the effect of feature extraction of the feature extraction model, and the proper learning rate can accelerate the training speed and improve the training effect.

Subsequently, training the first feature extraction model after the last training by adopting the next batch in the first training set according to the mode; and so on, they are not described in detail herein.

For example, after each batch of training in the first training set is completed, the trained first feature extraction model may be validated. Specifically, the first verification set may be input into the first feature extraction model, and the Loss value is calculated according to the above formula (1), so as to obtain a plurality of Loss values; the plurality of Loss values are in one-to-one correspondence with the plurality of batches in the first verification set. Thereafter, an average of the plurality of Loss values may be calculated to obtain a verification Loss value. In this way, after the first feature extraction model is trained by using a plurality of batches in the first training set, a plurality of verification Loss values can be obtained according to the above manner, and the plurality of verification Loss values are in one-to-one correspondence with the first feature extraction model trained by using the plurality of batches in the first training set. And then, selecting a first feature extraction model corresponding to the minimum verification Loss value from the verification Loss values as a model finally obtained through the whole training.

For example, after determining the first feature extraction model finally obtained through the whole training, the first feature extraction model may be tested by using a first test set to determine a feature extraction effect of the first feature extraction model; for example, parameters such as index accuracy, precision, regression rate and the like can be used to measure the feature extraction effect of the finally obtained first feature extraction model. When the feature extraction effect of the first feature extraction model does not meet the requirement, other public data sets can be adopted to generate a first training set or the first training set can be regenerated again based on the public data sets; then training the first feature extraction model again by adopting the regenerated first training set; and (5) repeating the steps in a circulating way until the first feature extraction model meets the requirement.

Next, a training process of the quantization processing model will be described.

Fig. 4a is a schematic diagram of a training process of an exemplary illustrated quantization process model.

S401, S digital content features are acquired, wherein the digital content features are obtained by feature extraction of data.

In a possible manner, the trained feature extraction model pair may be used to perform feature extraction on S pieces of data (one piece of data may be a piece of media content, for example, S pieces of data may be S pieces of data in the public data set), so as to obtain S digital content features; wherein the S pieces of data are in one-to-one correspondence with the S digital content features. Illustratively, the S pieces of data belong to the same scene type; for example, the scene types to which the S pieces of data belong are all scene type a.

S402, inputting the S digital content features into a quantization processing model, and clustering the S digital content features by the quantization processing model to determine M clustering centers.

For example, the S digital content features may be input to a quantization processing model, which clusters the S digital content features to determine M cluster centers; wherein M is a positive integer.

For example, in the process of training the quantization processing model for the first time, clustering the S digital content features to determine M cluster centers may refer to determining, for one digital content feature, a cluster center to which the digital content feature belongs from M preset cluster centers by the quantization processing model. Specifically, the quantization processing model may calculate distances (e.g., euclidean distance, manhattan distance, etc.) between the digital content features and M cluster centers; next, from the M cluster centers, the cluster center closest to the digital content feature is set as the cluster center to which the digital content feature belongs. In the subsequent process of training the quantization processing model each time, clustering S digital content features to determine M clustering centers can refer to determining the clustering center to which one digital content feature belongs from M clustering centers obtained by the previous training of the quantization processing model; reference may be made specifically to the above description, and no further description is given here.

In one possible approach, the quantization processing model may directly cluster the S digital content features to determine M cluster centers. In this case, a cluster center group may be obtained, which includes M cluster centers having the same dimensions as those of the digital content features. For example, the dimension of the digital content feature is 128 (i.e., the digital content feature is a vector of 1×128), then the dimension of the cluster center may be 128 (i.e., the cluster center is a vector of 1×128); where M may be equal to 128 times 2.

In a possible manner, the quantization processing model may perform feature transformation on the S digital content features to obtain S sub-feature groups; wherein, a sub-feature group includes W sub-features, W is a positive integer. Then, the kth sub-feature of each sub-feature group in the S sub-feature groups is clustered to determine G clustering centers; k is an integer between 1 and W. In this case, W cluster center groups may be obtained, each cluster center group including G cluster centers, m=w×g. Wherein the dimension of each cluster center is the same as the dimension of each sub-feature, and G may be equal to the power of 2. Therefore, the memory occupied by the storage clustering center can be reduced, and the quantization processing efficiency is improved.

That is, feature transformation is performed for each of the S digital content features to obtain a sub-feature group including W sub-features; thus, S sub-feature sets may be obtained. Then, selecting the kth sub-feature from each sub-feature group of the S sub-feature groups to obtain S sub-features; the S sub-features are then clustered to determine G cluster centers (which may form a kth cluster center group).

For example, assuming w=128, the dimension of the sub-feature may be 8; at this time, the dimension of one cluster center may be 8. Wherein G may be set as desired, e.g., may be determined according to an index of a cluster center. For example, if the index of the cluster center is represented by 8 bits of binary, G may be 256, and m=256×128; for another example, if the index of the cluster center is represented by 4 bits binary, G may be 32, where m=32×128 may be used, which is not limited by the present application.

By way of example, the feature transformation may be feature classification. Specifically, feature division may be performed on each of the S digital content features to obtain S sub-feature groups.

By way of example, the feature transformation may include the following operations: homogenization treatment and feature classification. Specifically, a homogenization treatment may be performed on the S digital content features to obtain S first intermediate features. In this way, the concentration of the digital content features can be diffused such that the energy distribution of the digital content features is uniform. And then, respectively carrying out feature division on each first intermediate feature in the S first intermediate features to obtain S sub-feature groups.

Illustratively, the homogenization treatment refers to using all elements in the digital content feature to determine each element in the first intermediate feature.

One implementation of the homogenization process may be to multiply the S digital content features by a first matrix, respectively, to obtain S second intermediate features; multiplying the S second intermediate features by a second matrix respectively to obtain S first intermediate features; the first matrix is a random matrix, and the second matrix is a matrix required to be trained for training the quantization processing model.

Wherein, in the process of training the quantization processing model for the first time, the elements in the second matrix are all preset elements. In the subsequent training process of the quantization processing model, the elements in the second matrix are obtained by the last training.

Optionally, the first matrix and the second matrix are orthogonal matrices, that is, the first matrix is a random orthogonal matrix, and the second matrix is an orthogonal matrix required to be trained in training the quantization process model.

Fig. 4b is a schematic diagram of an exemplary training process.

Referring to fig. 4b, an exemplary, digital content is assumed to be characterized by a dimension of 128, s=1000. After 1000 digital content features are input into the quantization processing model, the quantization processing model may multiply each digital content feature by a 1024-dimensional first matrix (where the first matrix is 1024×1024, and may multiply each 1×128 digital content feature by a 128×1024 sub-matrix in the first matrix) to obtain a second intermediate feature (the dimension of the second intermediate feature is 1024, where the second intermediate feature may be a 1×1024 vector). Next, the second intermediate feature is multiplied by a 1024-dimensional second matrix (where the second matrix is 1024×1024), so as to obtain a first intermediate feature (the dimension of the second intermediate feature is 1024, where the first intermediate feature may be a vector of 1×1024). Thereafter, the first intermediate feature may be divided into 128 sub-feature groups, each sub-feature group comprising 1000 8-dimensional sub-features (the sub-features may be vectors of 1*8). Then, the kth sub-feature is selected from the 128 sub-feature groups to be clustered, and finally a cluster center group can be obtained, wherein the cluster center group can comprise 256×128 cluster centers. In this way, 128 cluster center groups can be obtained.

S403, training the quantization processing model based on the M clustering centers and indexes of the M clustering centers.

For example, after determining M cluster centers, an index for each of the M cluster centers may be determined.

By way of example, the quantization processing model may be trained by constraining the distance between the index of the ith cluster center and the index of the jth cluster center, proportional to the distance between the ith cluster center and the jth cluster center, and constraining the distance between the p-th cluster center and the digital content feature (or sub-feature) belonging to the p-th cluster center; wherein i, j and p are integers between 1 and M. Wherein training the quantization processing model comprises: m cluster centers and indexes of the M cluster centers are trained.

Wherein, the digital content feature (or sub-feature) belonging to the p-th cluster center may refer to the digital content feature (or sub-feature) closest to the p-th cluster center.

Illustratively, in the process of training the quantization processing model for the first time, indexes of M preset cluster centers are adjusted, and M preset cluster centers are adjusted. In the subsequent training process of the quantization processing model, indexes of M clustering centers obtained by the last training are adjusted, and M clustering centers obtained by the last training are adjusted.

For example, the distance L between the index of the ith cluster center and the index of the jth cluster center and the distance between the ith cluster center and the jth cluster center can be determined with reference to the following formula (2) _diff ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,represents the ith cluster center c in the kth cluster center group _i,k With the j-th cluster center c _j,k Euclidean distance after regularization; />Representing the hamming distance after regularization of the index i of the ith cluster center and the index j of the jth cluster center in the kth cluster center group.

That is, the closer the cluster centers are to two cluster centers, the closer the index of the cluster center is. For example, the cluster center with index 1 (binary string "00000001") is closer to the cluster center with index 3 (binary string "00000011") than the cluster center with index 1 is to the cluster center with index 254 (binary number "11111110").

Illustratively, the distance L of the p-th cluster center from the digital content features in the set belonging to the p-th cluster center can be determined by the following formula (3) _dist ：

L _dist ＝∑ _r,k min _p (f _r,k -c _p,k ) (3)

Wherein c _p,k Representing the p-th cluster center c in the k-th cluster center group _p,k ，f _r,k An nth sub-feature (or digital content feature) belonging to the kth cluster center in the kth cluster center group is represented.

That is, the distance of each sub-feature (or digital content feature) from the associated cluster center is limited to ensure that the closely spaced sub-features (digital content features) can be separated into the same cluster center.

Illustratively, the value of the loss function of the quantization processing model may be determined with reference to the following equation (4):

L＝λ _dist *L _dist +λ _diff *L _diff (4)

wherein lambda is _dist 、λ _diff The super parameter can be specifically set according to the requirement, and the application is not limited to the super parameter.

The quantization model is then trained by minimizing L, constraining the distance between the index of the ith cluster center and the index of the jth cluster center to be proportional to the distance between the ith cluster center and the jth cluster center, and constraining the distance between the p-th cluster center and the digital content feature (or sub-feature) belonging to the p-th cluster center.

Furthermore, when the feature transformation includes a homogenization treatment and feature division, orthogonality of the second matrix may also be constrained; wherein the orthogonality of the second matrix may be determined by the following equation (5):

L _reg ＝||M ₂ ^T M ₂ -E|| _F ² (5)

in this case, the value of the loss function of the quantization processing model can be determined with reference to the following formula (6):

L＝L _reg +λ _dist *L _dist +λ _diff *L _diff (6)

the quantization model is then trained by minimizing L, constraining the distance between the index of the ith cluster center and the index of the jth cluster center to be proportional to the distance between the ith cluster center and the jth cluster center, and constraining the distance of the p-th cluster center to the digital content feature (or sub-feature) belonging to the p-th cluster center, and constraining the orthogonality of the second matrix. At this time, training the quantization processing model may further include: training the second matrix.

It should be understood that the S digital feature content may be used as a batch to train the quantization process model, or may be divided into a plurality of batches to train the quantization process model, which is not limited in this regard by the present application.

Similarly, a verification set may be generated by selecting S1 (S1 is a positive integer) digital content features, and a test set may be generated by selecting S2 (S2 is a positive integer) digital content features. Then, verifying the quantitative processing model by adopting a verification set, and testing the quantitative processing model by adopting a test set; reference is specifically made to the above description, and no further description is given here.

In this way, when the feature transformation is feature division, training the quantization processing model can obtain a cluster center group, and the cluster center group can comprise M cluster centers, wherein the dimensions of the M cluster centers are the same as those of the digital content feature; and training to obtain a second orthogonal matrix. Wherein M may be set as desired, for example, may be determined according to an index of the cluster center. For example, if the index of the cluster center is represented by 8 bits in binary, then M may be 256.

When the feature transformation comprises homogenization treatment and feature division, training the quantization treatment model to obtain W cluster center groups, wherein each cluster center group can comprise G cluster centers, and the dimensions of the G cluster centers are the same as those of the sub-features; and training to obtain a second orthogonal matrix. Wherein, W and G can be set according to the requirement, for example, G can be determined according to the index of the cluster center. For example, if the index of the cluster center is represented by 8 bits of binary, G may be 256, and m=256×128; for another example, if the index of the cluster center is represented by 4 bits binary, G may be 32, where m=32×128 may be used, which is not limited by the present application.

Furthermore, the dimension of the index output by the quantization processing of the digital content data is smaller than the dimension of the digital content characteristics through the trained quantization processing model; that is, the dimensions of the digital content features may be reduced after the quantization process is performed on the digital content features.

Compared with the method for directly clustering S digital content features, the method has the advantages that the S digital content features are clustered after feature transformation, so that on one hand, the clustering complexity can be reduced, the clustering efficiency can be improved, and on the other hand, the memory space occupied by a clustering center can be reduced; in addition, the convergence of the quantization processing model is facilitated, and the convergence rate of the quantization processing model is improved.

It should be noted that, in one possible manner, the quantization processing model may include a feature classification model and an index mapping model. In this case, the S digital content features may be input to a feature classification model, and the S digital content features are clustered to determine M cluster centers; and determining, from the index mapping model, an index for each of the M cluster centers. Training of the quantization process model may include training of a feature classification model (M cluster centers may be trained) and training of an index mapping model (M cluster centers may be trained). The feature classification model and the index mapping model may be an integral model or independent models, which is not limited in the present application.

In one possible approach, the quantization processing model may include a feature transformation model, a feature classification model, and an index mapping model. Under the condition, S digital content features can be input into a feature transformation model, the feature transformation model performs feature transformation on the S digital content features, S sub-feature groups are output to a feature classification model, and the S sub-feature groups are clustered to determine M clustering centers; and determining, from the index mapping model, an index for each of the M cluster centers. Training of the quantization process model may include training of a feature transformation model (which may be trained to obtain a second matrix), training of a feature classification model, and training of an index mapping model. The feature transformation model, the feature classification model and the index mapping model may be an integral model or independent models, which is not limited in the present application.

In one possible manner, the trained feature extraction model and the quantized processing model described above may be deployed in a server. Thus, when a user needs to generate a digital content identifier for a certain media content, the media content can be uploaded to a server through an identifier generation client (such as an APP, an applet or a webpage), the digital content identifier of the media content is generated by executing the identifier generation method of the application by the server, and the digital content identifier is returned to the identifier generation client; a client display is generated from the identification.

In a possible manner, the trained feature extraction model and the quantization processing model can be deployed in the terminal device, so that when a user needs to generate a digital content identifier for a certain media content, the media content can be input in an identifier generation client (such as an APP, an applet or a web page), and the identifier generation client executes the identifier generation method of the present application to generate and display the digital content identifier of the media content.

It should be understood that one of the feature extraction model and the quantization processing model may be deployed in a server, and the other in a terminal device, with the server and the terminal device cooperatively generating the digital content identification, which the present application is not limited to.

That is, the present application does not limit whether an electronic device that performs the identification generation method related to the present application is a terminal device or a server.

The following is a label generation based on the trained feature extraction model and the quantization processing model.

Fig. 5a is a schematic diagram of an exemplary illustrated identity generation process. In the training of the quantization processing model used in the embodiment of fig. 5a, the S digital content features are not feature transformed, but clustered directly.

S501, acquiring media content.

S502, classifying the media content and determining the scene type of the media content.

For example, according to the content type of the media content, the media content may be classified according to a classification mode corresponding to the content type of the media content, so as to determine the scene type to which the media content belongs.

In one possible manner, the media content may be classified by using a method of generating a type tag of the data in the public dataset, so as to determine a scene type to which the media content belongs.

In one possible manner, a classifier for classifying the data in the public dataset may be used to classify the media content and determine the scene type to which the media content belongs. The classifiers corresponding to the data of different content types can be different, the classifier corresponding to the content type of the media content can be selected, the media content is classified, and the scene type of the media content is determined.

S503, inputting the media content into the first feature extraction model, and outputting the digital content features of the media content; the first feature extraction model corresponds to a scene type to which the media content belongs.

Illustratively, in the embodiment of FIG. 3a described above, different feature extraction models are trained for different scene types; further, according to the scene type of the media content, a first feature extraction model corresponding to the scene type of the media content can be selected from the trained feature extraction models; the media content is then input to the first feature extraction model, and digital content features of the media content are output.

For example, when the number of content types of the media content is a plurality of, a plurality of first feature extraction models corresponding to a plurality of scene types to which the media content belongs may be selected from the trained plurality of feature extraction models. When the number of the content types of the media content is one, a first feature extraction model corresponding to a scene type to which the media content belongs can be selected from the trained feature extraction models.

Next, the first feature extraction model performs feature extraction on the media content, and digital content features of the media content are output. Wherein, the data class of the digital content characteristic can be a floating point type; for example, digital content is characterized as 128-dimensional floating point numbers.

The digital content features may then be input into a trained quantization model, and the quantization model performs quantization to obtain the first data.

In a possible manner, when the above-mentioned embodiment of fig. 4a trains the quantization processing model, the quantization processing may be as follows S504 to S505, when the feature transformation is not performed on S digital content features, but the clustering is performed on S digital content features directly:

s504, classifying the digital content features to determine one or more first clustering centers.

For example, one or more first cluster centers may be determined by classifying the digital content features with a trained feature classification model. By way of example, one or more first cluster centers corresponding to the digital content features may be selected from M cluster centers included in one cluster center group obtained by training the feature classification model.

For example, one or more first cluster centers may be determined by classifying the digital content features with a trained quantization processing model. By way of example, one or more first cluster centers corresponding to the digital content feature may be selected from M cluster centers included in one cluster center group obtained by training the quantization processing model.

For example, one or more cluster centers from among M cluster centers included in one cluster center group, which are at a preset distance from the digital content feature, may be selected as one or more first cluster centers.

In a possible manner, one or more cluster centers farthest from the digital content feature may be selected from M cluster centers included in one cluster center group as one or more first cluster centers.

In a possible manner, one or more cluster centers closest to the digital content feature may be selected from M cluster centers included in one cluster center group as one or more first cluster centers.

It should be appreciated that different cluster centers may characterize different classifications, that classifying a digital content feature may not output a specific classification of the digital content feature, and that only one or more first cluster centers corresponding to the digital content feature are determined, the first cluster centers corresponding to the categories of the digital content feature.

S505, determining first data according to indexes of one or more first clustering centers.

Illustratively, the index of one or more first cluster centers may be determined by a trained index mapping model; wherein, the indexes of one or more first cluster centers can be determined from the indexes of M cluster centers obtained by training an index mapping model.

Illustratively, the index of one or more first cluster centers may be determined by a trained quantization processing model; wherein, the indexes of one or more first cluster centers can be determined from the indexes of M cluster centers obtained by training the quantization processing model.

For example, when the first cluster center is one, the index of the first cluster center may be used as the first data. When the number of the first cluster centers is plural, the indexes of the plural first cluster centers may be processed (e.g., spliced) to obtain the first data.

It should be noted that S505 may be executed by the quantization processing model, or may be executed by another module, which is not limited by the present application.

Fig. 5b is a schematic diagram of an exemplary quantization process.

In fig. 5b (1), the dimension of the digital content feature is 128, and when the first cluster center is one and the index of the first cluster center is "0000000011011100..+ -.", then the first data is "0000000011011100..,".

In fig. 5b (2), the dimension of the digital content feature is 128, when the first cluster centers are two, wherein the index of one first cluster center is "0000000011011100.+ -. H", the index of the other first cluster center is "0000000011011111..the first data is" 00000000110111..the first data is "000000000011011111..the second data is" the second data.

It can be seen that the data amount of the first data is much smaller than the data amount of the digital content feature.

S507, the first data is encoded to obtain the digital content identification of the media content.

Illustratively, the first data may be encoded using a Base64, base58, UTF-8, or other encoding algorithm to obtain the digital content identification of the media content.

Compared with the digital content characteristics, the index dimension and the data volume of the first clustering center are smaller, so that the calculation cost and the time effect in the query process can be reduced to a great extent, and the query efficiency is improved.

Fig. 6a is a schematic diagram of an exemplary illustrated identity generation process. The quantization process model used in the embodiment of fig. 6a clusters S digital content features after feature transformation during training.

S601, acquiring media content.

S602, classifying the media content and determining the scene type of the media content.

S603, inputting the media content into the first feature extraction model, and outputting the digital content features of the media content; the first feature extraction model corresponds to a scene type to which the media content belongs.

For example, S601 to S603 may refer to the descriptions of S501 to S503, and are not described herein.

In a possible manner, when the above-mentioned embodiment of fig. 4a trains the quantization model and clusters the feature transformation of the S digital content features, the quantization process may be as follows S604 to S606:

s604, performing feature transformation on the digital content features to obtain a plurality of second data.

Wherein the number of second data may be W.

For example, if in the embodiment of fig. 4a, the feature transformation in the quantization processing model training process is feature division, in the embodiment of fig. 6a, the quantization processing model may perform feature division on the digital content feature, and divide the digital content feature into W pieces of second data.

For example, if in the embodiment of fig. 4a, the feature transformation during the training of the quantization processing model includes a homogenization and feature partitioning, then in the embodiment of fig. 6a, the quantization processing model may perform the homogenization on the digital content feature to obtain a third intermediate feature. In this way, the concentration of the digital content features can be diffused such that the energy distribution of the digital content features is uniform. And then, carrying out feature division on the third intermediate features to obtain W second data. Specifically, the digital content feature may be multiplied by the first matrix to obtain a fourth intermediate feature; multiplying the third intermediate feature by the second matrix to obtain a third intermediate feature; the first matrix is a random matrix, and the second matrix is a matrix trained in the process of training the quantization processing model.

Optionally, the first matrix and the second matrix are orthogonal matrices, that is, the first matrix is a random orthogonal matrix, and the second matrix is an orthogonal matrix obtained by training in the process of training the quantization processing model.

S605, classifying the second data, and determining a plurality of first clustering centers, wherein the first clustering centers are clustering centers from the second data to preset distances in the plurality of clustering centers.

Illustratively, the plurality of second data may be classified by the trained feature classification model to determine a plurality of first cluster centers. For example, a plurality of first cluster centers corresponding to a plurality of second data may be selected from W cluster center groups (each cluster center group may include G cluster centers) obtained by training a feature classification model.

Illustratively, the plurality of second data may be classified by the trained quantization processing model to determine a plurality of first cluster centers. For example, from the W cluster center groups obtained by training the quantization processing model, a plurality of first cluster centers corresponding to a plurality of second data may be selected.

Specifically, for the kth second data in the W second data, a distance between the kth second data and G cluster centers included in the kth cluster center group may be calculated; then, a cluster center having a preset distance from the second data of the kth cluster center group can be selected from the G cluster centers included in the kth cluster center group as the first cluster center. Thus, W second data, W first cluster centers can be determined.

It should be understood that from the G cluster centers included in the kth cluster center group, the cluster center closest to the second data of the group may be selected as the first cluster center. Or selecting a cluster center farthest from the second data of the k-th cluster center group from the G cluster centers included in the k-th cluster center group as a first cluster center.

It should be appreciated that different cluster centers may characterize different classifications, that classifying the second data may not output a classification of the second data, and that only one or more second cluster centers corresponding to the second data are determined, the second cluster centers corresponding to the categories of the second data.

S606, determining first data according to indexes of the plurality of first clustering centers.

For example, the index of the W first cluster centers may be used to determine the first data.

Illustratively, the indexes of the plurality of first cluster centers may be determined by a trained index mapping model; the indexes of the plurality of first cluster centers can be determined from indexes of W×G cluster centers obtained by training an index mapping model.

Illustratively, the index of one or more first cluster centers may be determined by a trained quantization processing model; the indexes of the plurality of first cluster centers can be determined from indexes of w×g cluster centers obtained by training the quantization processing model.

In a possible manner, the indexes of the W first cluster centers may be spliced according to the order of feature division, to obtain the first data.

In a possible manner, the indexes of the W first cluster centers may be spliced according to a preset sequence to obtain the first data.

That is, the present application does not limit the manner in which the first data is determined from the index of the first cluster center.

It should be noted that S606 may be executed by the quantization processing model, or may be executed by another module, which is not limited by the present application.

Fig. 6b is a schematic diagram of an exemplary quantization process.

Referring to fig. 6b, the digital content is characterized by 128-dimensional vectors, the first matrix is a 1024-dimensional random orthogonal matrix, and the second matrix is a 1024-dimensional orthogonal matrix obtained by training. After the digital content features are input to the quantization processing model, the quantization processing model may multiply the digital content features by the 1024-dimensional first matrix to obtain 1024-dimensional fourth intermediate features. Next, the fourth intermediate feature is multiplied by the 1024-dimensional second matrix, and a 1024-dimensional third intermediate feature can be obtained. Thereafter, the third intermediate feature may be partitioned into 128 sets of 8-dimensional second data. Then, for the kth group of second data, a cluster center closest to the kth group of second data may be selected as the first cluster center from 256 cluster centers included in the kth group of cluster centers; in this way, 128 first cluster centers can be obtained. Then, the indexes of the 128 first cluster centers are spliced in sequence. For example, the index of the 1 st first cluster center is 0000000, the index of the 2 nd cluster center is 00010000, the index of the 128 th first cluster center is 01100011, and the first data may be "000000000010000.

S607, the first data is encoded to obtain the digital content identification of the media content.

Compared with the embodiment of fig. 5a, the dimension of the clustering center of the embodiment of fig. 6a is lower, so that the classification calculation amount is small, and the classification efficiency is high; further, quantization processing efficiency can be improved, and therefore efficiency of generating digital content identifiers is improved.

In addition, the first clustering center is selected with smaller granularity, the selected first clustering center is more accurate, and further the obtained first data is more accurate, so that the accuracy of query can be improved.

Illustratively, to enable verification of the models of the present application for generating digital content identifications (e.g., feature extraction models and quantization process models), all input data may be recorded while training for the models for generating digital content identifications, such as:

1. model type. For example, models of feature extraction models such as ResNet50, VGG16, and the like.

2. A data set and a method for generating second training data in the training set. For example, the dataset may be a public dataset, such as a dataset of photo media may be ImageNet or Cifar100, or the like. The second training data generation method may generate the second training data using Stirmark Benchmark, for example, for an image.

3. Random parameters. The random parameters include initial weights of the model, selection order of the batch, and the like, which are generated by random generation. To ensure that all users are facilitated for model verification, a random seed will be known before training the model. A pseudorandom number generator (e.g., mersenne twist) is then used to generate a random number sequence from the random seed and to assign a value to the random parameter using the sequence. After the model and parameters are determined, the number and specification of the random parameters required are also determined, and the random number sequences generated by using the same random seeds can ensure that each random parameter is the same.

4. Empirical parameters. Some empirical parameters may be required in training to ensure that the modulus can be effective, converged, etc. Such as learning rate, interval parameters, epoch (iteration period), etc.

Based on the input data, the verification user can remodel the publicly released feature extraction model and the quantization processing model, so that verifiable capacity of the model is realized.

The application shown in fig. 1a will be specifically described based on the identification generation method in the above embodiment.

Fig. 7 is a schematic diagram of an exemplary query process.

S701, query data is received, the query data comprising a first media content and/or a first digital content identification of the first media content.

For example, referring to the description of FIG. 1a, the query data entered by the user may be the first media content and/or the first digital content identification of the first media content.

S702, first data of a first media content and first data of a plurality of second media content are determined.

For example, when the query data input by the user is the first digital content identifier of the first media content, the first digital content identifier may be decoded to obtain the first data of the first media content. And the second digital content identifier of the pre-stored second media content can be obtained, and the second digital content identifier of the second media content is decoded to obtain the first data of the second media content.

For example, when the query data input by the user is the first media content, part of the steps in the identification generation method in the above embodiment may be performed to generate the first data of the first media content; and generating first data of the plurality of second media content.

S703, determining the matching degree between the plurality of second digital contents and the first media contents according to the first data of the first media contents and the first data of the plurality of second media contents.

For example, a hamming distance between first data corresponding to the first media content and first data corresponding to the plurality of second media content may be used as a degree of matching between the first media content and the plurality of second media content.

S704, outputting the top N pieces of second media content with the highest matching degree, wherein N is a positive integer

Then, the top N pieces of second media content with the highest matching degree can be output; n may be set as desired, as the application is not limited in this regard.

Note that when S702 and S703 are terminal devices to execute, the output of S704 may refer to display. When S702 to S703 are executed by the server, the outputting of S704 may refer to outputting the top N pieces of second media content with the highest matching degree to the terminal device, and then displaying the top N pieces of second media content with the highest matching degree by the terminal device.

The application shown in fig. 1b will be specifically described based on the identifier generation method in the above embodiment.

Fig. 8 is a schematic diagram of an exemplary query process.

S801, query data is received, the query data including first media content and/or a first digital content identification of the first media content.

S802, first data of first media content and first data of a plurality of pieces of second media content are determined.

S803, determining a second media content matching the first media content from the plurality of second media contents based on the first data of the first media content and the first data of the plurality of second media contents.

For example, a hamming distance between first data corresponding to the first media content and first data corresponding to the plurality of second media content may be calculated; and determining the second media content with the smallest corresponding Hamming distance as the second media content matched with the first media content.

S804, outputting metadata of the second media content matched with the first media content.

Thereafter, metadata of the second media content matched with the first media content may be output; n may be set as desired, as the application is not limited in this regard.

Note that when S802 to S803 are terminal devices, the output of S804 may refer to display. When S802 to S803 are executed by the server, the outputting of S804 may refer to outputting metadata of the second media content matching the first media content to the terminal device, and then displaying the metadata of the second media content matching the first media content by the terminal device.

Fig. 9 is a schematic diagram of an exemplary illustrated identification generating apparatus. The identifier generating device may be used to perform the method of the foregoing embodiment, so that the advantages achieved by the identifier generating device may refer to the advantages of the corresponding method provided above, and will not be described herein.

Referring to fig. 9, exemplary, the identification generating means may include:

an acquisition module 901, configured to acquire media content;

a feature extraction module 902, configured to perform feature extraction on the media content to obtain digital content features of the media content;

the quantization processing module 903 is configured to perform quantization processing on the digital content feature to obtain first data;

the identifier generating module 904 is configured to determine, according to the first data, a digital content identifier of the media content.

The first data is indicative of an index of one or more first cluster centers, one or more of a plurality of cluster centers, the plurality of cluster centers for classification of the digital content features.

Illustratively, the one or more first cluster centers are one or more cluster centers of the digital content feature to a preset distance from the plurality of cluster centers.

Illustratively, the one or more first cluster centers are one or more cluster centers closest or farthest from the digital content feature to the plurality of cluster centers.

Illustratively, the quantization processing module 903 is specifically configured to classify the digital content feature, and determine one or more first cluster centers; the first data is determined from the index of the one or more first cluster centers.

The first clustering centers are a plurality of clustering centers with preset distances from the second data to the clustering centers, and the second data are data obtained by performing feature transformation according to the digital content features.

Illustratively, the quantization processing module 903 is specifically configured to perform feature transformation on the digital content feature to obtain a plurality of second data; classifying the second data, and determining a plurality of first clustering centers, wherein the first clustering centers are clustering centers with preset distances from the second data to the plurality of clustering centers; the first data is determined according to the indexes of the plurality of first cluster centers.

Illustratively, the digital content features are multi-dimensional data, and the second data has a dimension that is less than the dimension of the digital content features.

Illustratively, the index of the different first cluster centers is used to measure similarity between different media content.

The feature extraction module 902 is specifically configured to input the media content into the first feature extraction model and output digital content features of the media content.

Illustratively, the scene type to which the media content belongs corresponds to a first feature extraction model, the first feature extraction model is one or more of a plurality of feature extraction models, and the plurality of feature extraction models correspond to the plurality of scene types one to one.

Illustratively, the identification generating apparatus further includes: and the scene classification module is used for classifying the media content and determining the scene type of the media content.

The identifier generating module 904 is specifically configured to encode the first data to obtain a digital content identifier of the media content.

Illustratively, the media content includes at least one of: text, video, graphics, audio, images, or 3D models.

Illustratively, the digital content features include digital fingerprints.

Illustratively, the first feature extraction model is a trained neural network model, and the training set of the first feature extraction model includes: first training data, second training data, and third training data; the scene type of the first training data and the scene type of the third training data are the same as the scene type of the media content; the third training data is different from the first training data; the second training data is countermeasure data of the first training data.

Illustratively, the data type of the digital content feature is floating point, and the first data is a binary string.

Illustratively, the first data has a data amount that is less than a data amount of the digital content feature.

In one example, a schematic block diagram apparatus 1000 of an embodiment of the application is shown in fig. 10, which may include: processor 1001 and transceiver/transceiving pin 1002, optionally, also include memory 1003.

The various components of device 1000 are coupled together by bus 1004, where bus 1004 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are referred to in the figures as bus 1004.

Optionally, the memory 1003 may be used to store instructions in the foregoing method embodiments. The processor 1001 is operable to execute instructions in the memory 1003 and to control the receive pin to receive signals and the transmit pin to transmit signals.

The apparatus 1000 may be an electronic device or a chip of an electronic device in the above-described method embodiments.

All relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

The embodiment of the application also provides a chip which comprises one or more interface circuits and one or more processors; the one or more processors receive or transmit data via the one or more interface circuits, which when executed by the one or more processors cause the electronic device to perform the above-described associated method steps to implement the method of generating a signature in the above-described embodiments. The interface circuit is a transceiver/transceiver pin 1002.

The present embodiment also provides a computer-readable storage medium having stored therein computer instructions which, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the method for generating a logo in the above-described embodiments.

The present embodiment also provides a computer program product containing computer instructions which, when executed by a computer or processor, cause the computer to perform the above-described related steps to implement the method for generating a logo in the above-described embodiments.

In addition, embodiments of the present application also provide an apparatus, which may be embodied as a chip, component or module, which may include a processor and a memory coupled to each other; the memory is configured to store computer-executable instructions, and when the device is running, the processor may execute the computer-executable instructions stored in the memory, so that the chip executes the identifier generating method in the above method embodiments.

The electronic device, the computer readable storage medium, the computer program product or the chip provided in this embodiment are used to execute the corresponding method provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding method provided above, and will not be described herein.

It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Any of the various embodiments of the application, as well as any of the same embodiments, may be freely combined. Any combination of the above is within the scope of the application.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

The steps of a method or algorithm described in connection with the present disclosure may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access Memory (Random Access Memory, RAM), flash Memory, read Only Memory (ROM), erasable programmable Read Only Memory (Erasable Programmable ROM), electrically Erasable Programmable Read Only Memory (EEPROM), registers, hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer-readable storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Claims

1. A method of generating a logo, the method comprising:

Acquiring media content;

extracting the characteristics of the media content to obtain the digital content characteristics of the media content;

carrying out quantization processing on the digital content characteristics to obtain first data;

and determining the digital content identification of the media content according to the first data.

2. The method of claim 1, wherein the first data indicates an index of one or more first cluster centers, the one or more first cluster centers being one or more of a plurality of cluster centers for classification of the digital content features.

3. The method of claim 2, wherein the one or more first cluster centers are one or more cluster centers of the digital content feature to a preset distance from the plurality of cluster centers.

4. The method of claim 3, wherein the one or more first cluster centers are one or more cluster centers of the plurality of cluster centers that are closest or farthest from the digital content feature.

5. The method according to claim 3 or 4, wherein said quantizing the digital content features to obtain first data comprises:

Classifying the digital content features, determining the one or more first cluster centers;

and determining the first data according to the indexes of the one or more first clustering centers.

6. The method of claim 2, wherein the step of determining the position of the substrate comprises,

7. The method of claim 6, wherein said quantizing said digital content features to obtain first data comprises:

performing feature transformation on the digital content features to obtain a plurality of second data;

classifying the plurality of second data, and determining the plurality of first clustering centers, wherein the first clustering centers are clustering centers with preset distances from the second data to the plurality of clustering centers;

and determining the first data according to indexes of the plurality of first clustering centers.

8. The method of claim 6 or 7, wherein the digital content features are multi-dimensional data and the second data has a dimension that is smaller than the dimension of the digital content features.

9. The method according to any of claims 2 to 8, wherein the index of different first cluster centers is used to measure the similarity between different media content.

10. The method according to any one of claims 1 to 9, wherein the feature extraction of the media content to obtain digital content features of the media content comprises:

and inputting the media content into a first feature extraction model, and outputting the digital content features of the media content.

11. The method of claim 10, wherein the scene type to which the media content belongs corresponds to the first feature extraction model, the first feature extraction model being one or more of a plurality of feature extraction models, the plurality of feature extraction models corresponding one-to-one to a plurality of scene types.

12. The method according to any one of claims 1 to 11, further comprising:

and classifying the media content and determining the scene type of the media content.

13. The method according to any one of claims 1 to 12, wherein said determining a digital content identification of said media content from said first data comprises:

And encoding the first data to obtain the digital content identification of the media content.

14. The method of any one of claims 1 to 13, wherein the media content comprises at least one of: text, video, graphics, audio, images, or 3D models.

15. The method of any one of claims 1 to 14, wherein the digital content features comprise digital fingerprints.

16. The method according to claim 10 or 11, wherein the first feature extraction model is a trained neural network model, and the training set of first feature extraction models comprises: first training data, second training data, and third training data;

the scene type of the first training data and the scene type of the third training data are the same as the scene type of the media content; the third training data is different from the first training data; the second training data is countermeasure data of the first training data.

17. The method according to any one of claims 1 to 16, wherein,

the data type of the digital content features is floating point type, and the first data is binary character string.

18. The method according to any one of claims 1 to 17, wherein,

the first data has a data amount smaller than a data amount of the digital content feature.

19. A method of querying, the method comprising:

receiving query data, wherein the query data comprises first media content and/or digital content identification of the first media content, and metadata of the first media content is lost;

determining second media content matched with the first media content from the plurality of second media content according to the first data of the first media content and the first data of the plurality of second media content; the first data of the first media content is obtained by carrying out quantization processing on the digital content characteristics of the first media content or decoding the first digital content identifier, and the first data of the second media content is obtained by carrying out quantization processing on the digital content characteristics of the second media content or decoding the digital content identifier of the second media content;

metadata of second media content matching the first media content is output.

20. The method of claim 19, wherein the determining, from the first data of the first media content and the first data of the plurality of second media content, the second media content that matches the first media content from the plurality of second media content comprises:

Determining a hamming distance between the first data of the first media content and the first data of the plurality of second media content;

and determining the second media content with the minimum corresponding Hamming distance as the second media content matched with the first media content.

21. A method of querying, the method comprising:

receiving query data, wherein the query data comprises first media content and/or digital content identification of the first media content;

determining the matching degree between the plurality of second digital contents and the first media content according to the first data of the first media content and the first data of the plurality of second media contents; the first data of the first media content is obtained by carrying out quantization processing on the digital content characteristics of the first media content or decoding the digital content identification of the first media content, and the first data of the second media content is obtained by carrying out quantization processing on the digital content characteristics of the second media content or decoding the digital content identification of the second media content;

and outputting the first N pieces of second media content with the highest matching degree, wherein N is a positive integer.

22. The method of claim 21, wherein the determining a degree of matching between the plurality of second digital content and the first media content based on the first digital content identification of the first media content and the second digital content identification of the plurality of second media content comprises:

and taking the Hamming distance between the first data of the plurality of pieces of second media content and the first data of the first media content as the matching degree between the plurality of pieces of second media content and the first media content.

23. An identity generation device, characterized by being configured to perform the identity generation method of any one of the preceding claims 1 to 18.

24. A query device for performing the query method of any of claims 19 to 22.

25. An electronic device, comprising:

a memory and a processor, the memory coupled with the processor;

the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 18.

26. A chip comprising one or more interface circuits and one or more processors; the one or more processors receive or transmit data via the one or more interface circuits, which when executed by the one or more processors cause the electronic device to perform the method of any one of claims 1 to 18.

27. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which when run on a computer or a processor causes the computer or the processor to perform the method of any one of claims 1 to 18.

28. A computer program product comprising computer instructions which, when executed by a computer or processor, cause the steps of the method of any one of claims 1 to 18 to be performed.