CN111639198A

CN111639198A - Media file identification method and device, readable medium and electronic equipment

Info

Publication number: CN111639198A
Application number: CN202010495559.1A
Authority: CN
Inventors: 黄鑫; 白刚; 董琦; 宋旸; 肖洋
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2020-09-08

Abstract

The disclosure relates to a media file identification method, a media file identification device, a readable medium and electronic equipment. The method comprises the following steps: acquiring a feature vector of a media file to be identified as a first feature vector; determining a character string corresponding to the first feature vector as a first fingerprint identifier of the media file to be identified; and determining whether a target media file matched with the media file to be identified exists in the media files stored in the database according to the first fingerprint identification so as to determine whether the media file to be identified belongs to the database. Therefore, whether any media file belongs to the database or not can be identified without manual intervention, the identification efficiency is improved, and the identification accuracy can be ensured due to the fact that identification is carried out based on the characteristics of the media files. In addition, the recognition result can be used for scenes such as copyright recognition, and the original content can be better protected from being abused.

Description

Media file identification method and device, readable medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a media file identification method, an apparatus, a readable medium, and an electronic device.

Background

With the development of computer technology, data sharing is also more convenient, for example, data of others is directly used in a copying and pasting mode, that is, contents shared on line by others are copied and pasted for the users. Generally, online multimedia content (e.g., images, videos) is authored and shared by users (or platforms), and as the online multimedia content increases, the protection requirements of the users (or platforms) on the original content of the users (or platforms) are gradually increased. Therefore, how to protect the original content of the user (or platform) from being illegally used by others is an important problem to be solved at present. In the related art, generally, it is determined by manual review whether a certain multimedia content is the same as the original content source of a user (or a platform), in this process, a large amount of manpower is required, efficiency is not high, and a case of a determination error may occur, and accuracy is low.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a media file identification method, including:

acquiring a feature vector of a media file to be identified as a first feature vector;

determining a character string corresponding to the first feature vector as a first fingerprint identifier of the media file to be identified;

and determining whether a target media file matched with the media file to be identified exists in the media files stored in the database according to the first fingerprint identification so as to determine whether the media file to be identified belongs to the database.

In a second aspect, the present disclosure provides a media file identification apparatus, the apparatus comprising:

the acquisition module is used for acquiring a feature vector of a media file to be identified as a first feature vector;

a first determining module, configured to determine a character string corresponding to the first feature vector as a first fingerprint identifier of the media file to be identified;

and the second determining module is used for determining whether a target media file matched with the media file to be identified exists in the media files stored in the database according to the first fingerprint identification so as to determine whether the media file to be identified belongs to the database.

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect of the present disclosure.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the method of the first aspect of the present disclosure.

According to the technical scheme, for the media file to be recognized, firstly, the feature vector of the media file to be recognized is obtained as the first feature vector, the character string corresponding to the first feature vector is determined to be used as the first fingerprint identification of the media file to be recognized, then, whether the target media file matched with the media file to be recognized exists in the media files stored in the database or not is determined according to the first fingerprint identification, and further, whether the media file to be recognized belongs to the database or not is determined. Therefore, the first fingerprint identification capable of uniquely representing the media file to be identified is obtained by extracting the characteristics of the media file to be identified, and whether the media file to be identified belongs to the database is determined according to the first fingerprint identification. Therefore, whether any media file belongs to the database or not can be identified without manual intervention, the identification efficiency is improved, and the identification accuracy can be ensured due to the fact that identification is carried out based on the characteristics of the media files. In addition, the recognition result can be used for scenes such as copyright recognition, and the original content can be better protected from being abused.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

In the drawings:

FIG. 1 is a flow diagram of a media file identification method provided in accordance with one embodiment of the present disclosure;

FIG. 2 is an exemplary flowchart of the step of determining a string corresponding to a first feature vector in a media file identification method provided in accordance with the present disclosure;

FIG. 3 is an exemplary flowchart of the step of determining whether a target media file exists according to a first fingerprint according to the media file identification method provided by the present disclosure;

FIG. 4 is a block diagram of a media file identification apparatus provided in accordance with one embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flowchart of a media file identification method provided according to an embodiment of the present disclosure. As shown in fig. 1, the method may include the following steps.

In step 11, a feature vector of the media file to be identified is obtained as a first feature vector.

The media file to be identified may be an image or a video. It should be noted that, without additional description, all media files mentioned in the present solution are images or videos.

In general, the original content of a user (or platform) can be maintained in its own database, in which a variety of media files are stored. The present disclosure is directed to identifying whether an external media file belongs to a database, that is, identifying whether the media file is identical to a media file stored in the database, and determining whether the media file is from the database.

For example, the media file to be identified may be actively acquired to actively identify whether the media file to be identified originates from the database, actively avoiding the risk. For example, an API of a certain platform may be docked, and all media files launched under a corresponding account may be captured according to the API token and the account information, where each media file may be used as a media file to be identified in the present disclosure.

And acquiring the characteristic vector of the media file to be identified as a first characteristic vector aiming at the media file to be identified. Here, the media file to be recognized is an image or a video, and the feature vector thereof is a visual feature vector. That is, an unstructured media file to be identified (image or video) is translated into a vector in a specific vector space as a feature representation of the media file to be identified. The dimensions of the particular vector space can be set freely, as set by empirical values. Illustratively, a feature vector of the file to be recognized may be extracted through a neural network as the first feature vector.

The process of translation may be viewed as mapping data from one space to another. The feature vectors obtained through the same conversion mode are in the same feature space, and the embedded vectors of the data with similar features in the feature space are also similar. Taking an image as an example, the image is generally represented by a two-dimensional pixel value of three RGB color channels, for example, a 1024x1024 pixel image can be represented by a 3x1024x1024 high-dimensional vector.

In step 12, a character string corresponding to the first feature vector is determined as a first fingerprint identifier of the media file to be identified.

After the first feature vector is obtained, the first feature vector may be further converted into a corresponding character string, which is used as a first fingerprint identifier of the media file to be identified, so as to uniquely represent the media file to be identified. Because the first feature vector represents the visual features of the media file to be identified, the obtained first fingerprint identification can also represent the visual features of the media file to be identified. Here, the conversion of the first feature vector into the character string may be regarded as mapping data from a vector space corresponding to the first feature vector into a space of the character string. Wherein, the digit of the character string can be freely set, such as set according to the empirical value.

In step 13, it is determined whether a target media file matching the media file to be recognized exists in the media files stored in the database according to the first fingerprint identifier, so as to determine whether the media file to be recognized belongs to the database.

As described above, the first fingerprint identification can characterize the visual characteristics of the media file to be recognized, and the visual characteristics of the media files stored in the database are obviously easily obtained, so that based on the first fingerprint identification, it can be determined whether the media files stored in the database have target media files matching the media file to be recognized, so as to determine whether the media file to be recognized belongs to the database. Here, the matching may be the same or have a higher similarity.

In order to make those skilled in the art understand the technical solutions provided by the embodiments of the present invention, the following detailed descriptions of the corresponding steps and related concepts are provided.

In one possible embodiment, step 11 may comprise the steps of:

at least one image in the media file to be identified is input into an image classification model, and a first feature vector is obtained from the output content of a feature extraction layer of the image classification model.

In one possible embodiment, if the media file to be recognized is an image, the image of the media file to be recognized is directly input into the image classification model.

In another possible embodiment, if the media file to be identified is a video, all images in the media file to be identified (i.e., images of all frames of the video) may be input into the image classification model. Because all images are used during feature extraction, the richness of information contained in the first feature vector can be ensured, and the visual features of the media file to be identified can be reflected more comprehensively.

In another possible embodiment, if the media file to be recognized is a video, frame extraction processing may be performed on the media file to be recognized first, and a plurality of images extracted after the frame extraction processing are input into the image classification model. Therefore, a plurality of representative images are selected through frame extraction processing, and the processing efficiency can be improved on the premise of ensuring the accuracy based on the characteristics of the images. The way of performing the frame extraction processing on the video is common knowledge in the art, and is not described in detail here.

The image classification model may be a model obtained by training a convolutional neural network.

Illustratively, the image classification model may be obtained by:

acquiring a plurality of groups of training data, wherein each group of training data comprises a historical media file and an image category corresponding to the historical media file;

and training the convolutional neural network model according to the multiple groups of training data to obtain an image classification model.

In each training process, taking a historical media file in a group of training data as input data, taking an image category corresponding to the historical media file as real output, training a convolutional neural network model, and under the condition that a model training stopping condition is not met, adjusting parameters in the model according to the actual output of the current input and the real output corresponding to the historical media file in the training data of the model until the model training stopping condition is met to obtain the image classification model.

In the training process of one time, the internal parameters of the model are continuously adjusted, so that the finally obtained image classification model has a good classification effect, and the classification accuracy of the image classification model depends on the feature extraction of each layer in the model, so that the visual features (namely, image features) of the media file can be accurately extracted by the middle layer of the obtained image classification model.

Therefore, in order to obtain the visual features of the media file to be recognized, the media file to be recognized can be input into the image classification model, and the first feature vector can be obtained from the middle layer of the image classification model. The first feature vector may be obtained from output content of a feature extraction layer of the image classification model. The feature extraction layer is a layer (such as the last layer) located before the full connection layer in the image classification model, and can extract more high-level features (i.e., high level features which are established above low level features and can be used for identifying and detecting the shape of an object or an object in an image, and the feature extraction layer has richer semantic information. Illustratively, the feature extraction layer may be an average pooling layer of the image classification model.

For example, if the media file to be recognized is an image, 1536-dimensional output of the untrained inclusion-net-v 2 network second-to-last average pooling layer can be used as a feature vector of the image, so that the image can be effectively represented in a generalization task (the image feature embedding is extracted by adopting a deep convolution network). For another example, if the media file to be recognized is a video, a video frame image of 16 frames 112 × 112 pixels may be extracted as an input to the video, and 2048-dimensional output of the average pooling layer of the second last layer of the pretrained resenext-101 network may be used as a feature vector of the video (a 3D convolution network is used to extract video feature embedding).

In step 12, a character string corresponding to the first feature vector is determined and used as a first fingerprint identifier of the media file to be identified.

In one possible embodiment, step 12 may include the following steps, as shown in FIG. 2.

In step 21, the first feature vector is vector quantized, and the first feature vector is converted into a second feature vector with a preset dimension.

Wherein the preset dimension is smaller than the dimension of the first feature vector.

Since the first feature vector acquired in step 11 is complex and requires high computational power during subsequent computation, vector quantization (vector quantization) may be performed on the first feature vector first, and the first feature vector may be converted into a second feature vector with a lower dimensionality, so as to reduce subsequent computation and improve data processing speed. That is, the real-valued first eigenvector is converted into discrete-valued second eigenvector by a vector quantization technique. The method of vector quantization is common knowledge in the art, and only one of the realizable methods is described here by way of example, and the other realizable methods are not described in detail.

Illustratively, the preset dimension may be 128, and accordingly, the first eigenvector may be vector-quantized in a grouping clustering manner, and the high-dimensional first eigenvector (floating point vector) may be mapped to the low-dimensional second eigenvector (discrete vector). For example, a first feature vector of 1536 dimensions (corresponding to the image in the above example), one for each 12 dimensions, may be divided into 128 groups of sub-vectors, each group of sub-vectors being clustered to 128 (preset dimensions) cluster centers by the K-Means clustering model. As another example, a 2048-dimensional first feature vector (corresponding to the video in the above example) is divided into 128 groups of sub-vectors every 16 dimensions, and each group of sub-vectors is clustered to 128 (preset-dimension) cluster centers by a K-Means clustering model. The K-Means clustering model can be obtained by adopting samples existing in a database through unsupervised training. By the above vector quantization method, the features of an arbitrary media file can be encoded as a 128-dimensional (preset dimension) vector.

In step 22, the second feature vector is subjected to code conversion to convert the second feature vector into a character string with a preset number of bits, and the character string is used as the first fingerprint identifier.

After the second feature vector is obtained, the second feature vector may be subjected to code conversion, so as to convert the second feature vector into a character string with a preset number of bits, where the character string may uniquely represent the second feature vector, that is, may uniquely represent a visual feature corresponding to the media file to be identified. Illustratively, the string of the preset number of bits may be a 32-bit 16-ary string. For example, if the second feature vector is a 128-dimensional vector and the string with the preset number of bits is a 32-bit string, every 4 dimensions of the second feature vector may be grouped (i.e., one bit in the string is used to represent 4 dimensions in the second feature vector), i.e., the 128-dimensional vector is mapped into a space of the 32-bit string. The way of converting the vector into the character is common knowledge in the art, and thus a more specific description is not given here.

In another possible embodiment, the above-mentioned vector quantization step may also be omitted to obtain the first fingerprint identification quickly, i.e.:

and performing code conversion on the first feature vector to convert the first feature vector into a character string with preset digits, and taking the character string as a first fingerprint identifier.

The manner of transcoding is similar to that given above, i.e. mapping the first eigenvector into a string space, which is common knowledge in the art, and is not described herein again.

In one possible approach, step 13 may include the following steps, as shown in fig. 3.

In step 31, determining whether a target media file exists according to the first fingerprint identification and a second fingerprint identification corresponding to each media file stored in the database;

in step 32, if the target media file exists, determining that the media file to be identified belongs to the database;

in step 33, if the target media file does not exist, it is determined that the media file to be identified does not belong to the database.

Each media file may have a respective second fingerprint identification corresponding to the media file already stored in the database. The second fingerprint mark is a character string corresponding to the stored media file and capable of representing the visual characteristics of the stored media file. The second fingerprint identification is obtained in the same manner as the first fingerprint identification, namely: and acquiring a feature vector of the media file, and converting the feature vector into a character string. For details, reference may be made to the explanation of steps 11 and 12 given above, which are not described herein. Here, the media files already stored in the database may correspond to the original content of the user (or platform) as described above.

In one possible embodiment, the media file to be identified and the media files in the database may be fuzzy-matched to improve the fault tolerance of the matching and avoid missing the target media file that is consistent with the media file to be identified. Accordingly, step 31 may comprise the steps of:

respectively calculating the similarity of the first fingerprint identification and each second fingerprint identification;

if the similarity has a target similarity larger than a preset similarity threshold, determining the stored media file corresponding to the target similarity as a target media file;

and if the target similarity does not exist in the similarity, determining that the target media file does not exist.

For example, the similarity between the first fingerprint identification and the second fingerprint identification can be determined by calculating the distance between the two fingerprints in the designated space, wherein the farther the two fingerprints are away, the lower the similarity is, and the closer the two fingerprints are, the higher the similarity is. For another example, the similarity between the first fingerprint identifier and the second fingerprint identifier may be calculated directly using a calculation method such as cosine similarity.

In another possible embodiment, the media file to be identified may be precisely matched with the media files in the database to ensure the matching accuracy and find a target media file consistent with the media file to be identified. Accordingly, step 31 may comprise the steps of:

if a second fingerprint identification identical to the first fingerprint identification exists in the second fingerprint identification, determining the stored media file corresponding to the second fingerprint identification identical to the first fingerprint identification as a target media file;

and if the second fingerprint identification which is the same as the first fingerprint identification does not exist, determining that the target media file does not exist.

For example, the first fingerprint identifier and the second fingerprint identifier may be compared bit by bit, and if each bit of the character string is the same, it may be determined that the two are the same.

Optionally, the method provided by the present disclosure may further include the steps of:

and if the media file to be identified belongs to the database, outputting prompt information.

The prompt information is used for prompting that the media file to be identified is the same as the target media file.

If the media file to be identified is determined to belong to the database, the media file to be identified is identical to a certain existing media file in the database, namely, the media file to be identified is from the target media file, therefore, prompt information can be output to prompt that the media file to be identified is identical to the target media file, and the media file to be used is convenient to quickly locate.

Fig. 4 is a block diagram of a media file identification apparatus provided in accordance with one embodiment of the present disclosure. As shown in fig. 4, the apparatus 40 may include:

an obtaining module 41, configured to obtain a feature vector of a media file to be identified as a first feature vector;

a first determining module 42, configured to determine a character string corresponding to the first feature vector as a first fingerprint identifier of the media file to be identified;

a second determining module 43, configured to determine, according to the first fingerprint identifier, whether a target media file matching the to-be-identified media file exists in the media files already stored in the database, so as to determine whether the to-be-identified media file belongs to the database.

Optionally, the obtaining module 41 is configured to input at least one image in the media file to be identified into an image classification model, and obtain the first feature vector from an output content of a feature extraction layer of the image classification model, where the feature extraction layer is a layer of the image classification model located before a fully connected layer.

Optionally, the first determining module 42 includes:

a first conversion sub-module, configured to perform vector quantization on the first feature vector, and convert the first feature vector into a second feature vector with a preset dimension, where the preset dimension is smaller than a dimension of the first feature vector;

and the second conversion sub-module is used for performing coding conversion on the second feature vector so as to convert the second feature vector into a character string with preset digits and take the character string as the first fingerprint identifier.

Optionally, the second determining module 43 includes:

a first determining sub-module, configured to determine whether the target media file exists according to the first fingerprint identifier and a second fingerprint identifier corresponding to each stored media file in a database, where the second fingerprint identifier is a character string corresponding to the stored media file;

the second determining submodule is used for determining that the media file to be identified belongs to the database if the target media file exists;

and the third determining submodule is used for determining that the media file to be identified does not belong to the database if the target media file does not exist.

Optionally, the first determining sub-module is configured to: respectively calculating the similarity of the first fingerprint identification and each second fingerprint identification; if the similarity has a target similarity larger than a preset similarity threshold, determining a stored media file corresponding to the target similarity as the target media file; and if the target similarity does not exist in the similarity, determining that the target media file does not exist.

Optionally, the first determining sub-module is configured to: if a second fingerprint identifier which is the same as the first fingerprint identifier exists in the second fingerprint identifiers, determining the stored media file corresponding to the second fingerprint identifier which is the same as the first fingerprint identifier as the target media file; and if the second fingerprint identification which is the same as the first fingerprint identification does not exist, determining that the target media file does not exist.

Optionally, the apparatus 40 further comprises:

and the output module is used for outputting prompt information if the media file to be identified belongs to the database, wherein the prompt information is used for prompting that the media file to be identified is the same as the target media file.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Referring now to FIG. 5, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a feature vector of a media file to be identified as a first feature vector; determining a character string corresponding to the first feature vector as a first fingerprint identifier of the media file to be identified; and determining whether a target media file matched with the media file to be identified exists in the media files stored in the database according to the first fingerprint identification so as to determine whether the media file to be identified belongs to the database.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation of the module itself, and for example, the obtaining module may also be described as a "module that obtains a feature vector of a media file to be identified as a first feature vector".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a media file identification method, the method including:

According to one or more embodiments of the present disclosure, a media file identification method is provided, where the obtaining a feature vector of a media file to be identified as a first feature vector includes:

inputting at least one image in the media file to be identified into an image classification model, and acquiring the first feature vector from the output content of a feature extraction layer of the image classification model, wherein the feature extraction layer is a layer which is positioned before a full connection layer in the image classification model.

According to one or more embodiments of the present disclosure, a media file identification method is provided, where the determining a character string corresponding to the first feature vector as a first fingerprint identifier of the media file to be identified includes:

performing vector quantization on the first feature vector, and converting the first feature vector into a second feature vector with a preset dimension, wherein the preset dimension is smaller than the dimension of the first feature vector;

and performing code conversion on the second feature vector to convert the second feature vector into a character string with preset digits, and taking the character string as the first fingerprint identifier.

According to one or more embodiments of the present disclosure, a media file identification method is provided, where determining, according to the first fingerprint identifier, whether a target media file matching the media file to be identified exists in media files already stored in the database to determine whether the media file to be identified belongs to the database includes:

determining whether the target media file exists according to the first fingerprint identification and a second fingerprint identification corresponding to the stored media file in a database, wherein the second fingerprint identification is a character string corresponding to the stored media file;

if the target media file exists, determining that the media file to be identified belongs to the database;

and if the target media file does not exist, determining that the media file to be identified does not belong to the database.

According to one or more embodiments of the present disclosure, a media file identification method is provided, where determining whether a target media file matching a media file to be identified exists according to a first fingerprint identifier and a second fingerprint identifier corresponding to each of media files already stored in a database includes:

if the similarity has a target similarity larger than a preset similarity threshold, determining a stored media file corresponding to the target similarity as the target media file;

if a second fingerprint identifier which is the same as the first fingerprint identifier exists in the second fingerprint identifiers, determining the stored media file corresponding to the second fingerprint identifier which is the same as the first fingerprint identifier as the target media file;

According to one or more embodiments of the present disclosure, there is provided a media file identification method, the method further including:

and if the media file to be identified belongs to the database, outputting prompt information, wherein the prompt information is used for prompting that the media file to be identified is the same as the target media file.

According to one or more embodiments of the present disclosure, there is provided a media file identification apparatus, the apparatus including:

According to one or more embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, which when executed by a processing device, implements the steps of the media file identification method provided by any of the embodiments of the present disclosure.

According to one or more embodiments of the present disclosure, there is provided an electronic device including:

a storage device having a computer program stored thereon;

and the processing device is used for executing the computer program in the storage device to realize the steps of the media file identification method provided by any embodiment of the disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A media file identification method, the method comprising:

2. The method according to claim 1, wherein the obtaining the feature vector of the media file to be identified as the first feature vector comprises:

3. The method according to claim 1, wherein the determining a character string corresponding to the first feature vector as the first fingerprint identifier of the media file to be identified comprises:

4. The method of claim 1, wherein the determining whether a target media file matching the media file to be identified exists in the media files stored in the database according to the first fingerprint identifier to determine whether the media file to be identified belongs to the database comprises:

5. The method according to claim 4, wherein the determining whether there is a target media file matching the media file to be identified according to the first fingerprint identifier and a second fingerprint identifier corresponding to each of the media files stored in the database comprises:

6. The method according to claim 4, wherein the determining whether there is a target media file matching the media file to be identified according to the first fingerprint identifier and a second fingerprint identifier corresponding to each of the media files stored in the database comprises:

7. The method according to any one of claims 1-6, further comprising:

8. An apparatus for media file identification, the apparatus comprising:

9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 7.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.