CN110532404B

CN110532404B - Source multimedia determining method, device, equipment and storage medium

Info

Publication number: CN110532404B
Application number: CN201910828971.8A
Authority: CN
Inventors: 张晓寒; 任可欣; 冯知凡; 张扬; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2023-08-04
Anticipated expiration: 2039-09-03
Also published as: CN110532404A

Abstract

The application discloses a method, a device, equipment and a storage medium for determining source multimedia, and relates to the technical field of intelligent searching. The specific implementation scheme is as follows: acquiring at least one piece of multimedia association information corresponding to the multimedia fragment; the multimedia associated information comprises at least one of descriptive text of the multimedia fragment, a media element identification result and user input label information; extracting the entity of each multimedia associated information, and determining at least one associated multimedia corresponding to the multimedia fragment according to each extracted entity and a preset knowledge graph; source multimedia corresponding to the multimedia clip is determined from each associated multimedia. According to the technical scheme, the related multimedia is determined by combining the entity corresponding to the multimedia related information and the preset knowledge graph, the determination range of the source multimedia is reduced, the source multimedia is determined from the related multimedia, the interference of the unreal source multimedia is reduced, and the determination efficiency and accuracy of the source multimedia are improved.

Description

Source multimedia determining method, device, equipment and storage medium

Technical Field

The application relates to the technical field of data processing, in particular to the technical field of intelligent searching.

Background

With the increase of multimedia resources (e.g., audio and video data) on the internet, there are more and more multimedia related applications, such as multimedia data recommendation, multimedia data search, etc. When the application is used for scenes such as accurate recommendation, search recommendation and accurate search, the source data of the multimedia needs to be positioned.

The prior art generally uses a fingerprint identification technology to extract and compare a multimedia segment with a plurality of multimedia data, and determine a source multimedia corresponding to the multimedia segment from the plurality of multimedia data.

However, fingerprint extraction and fingerprint comparison are time consuming, making the source multimedia determination process overall time consuming; in addition, the method is easy to be interfered by non-real source data such as entertainment news added with multimedia fragments only through fingerprint extraction and comparison, the entertainment news is misjudged as source multimedia, and the accuracy of the determined source multimedia is reduced.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for determining source multimedia, so as to improve the determination efficiency of the source multimedia and the accuracy of a determination result.

In a first aspect, an embodiment of the present application provides a method for determining a source multimedia, including:

acquiring at least one piece of multimedia association information corresponding to the multimedia fragment; wherein the multimedia association information comprises at least one of descriptive text of the multimedia fragment, a media element identification result and user input tag information;

extracting the entity of each piece of multimedia associated information, and determining at least one associated multimedia corresponding to the multimedia segment according to each extracted entity and a preset knowledge graph;

and determining the source multimedia corresponding to the multimedia fragments from the associated multimedia.

According to the embodiment of the application, through acquiring the multimedia association information corresponding to the multimedia fragments and extracting the entities of the multimedia association information, the associated multimedia is screened according to the extracted entities and the preset knowledge graph, and then the source multimedia corresponding to the multimedia fragments is determined from the associated multimedia, so that the problems that the time consumption of the source multimedia determining process is long and the accuracy of the source multimedia determining is low are solved. According to the technical scheme, the related multimedia is determined by combining the entities corresponding to the multimedia related information under different dimensions and the preset knowledge graph, the determination range of the source multimedia is reduced, the source multimedia is determined from the related multimedia, the interference of the unreal source multimedia is reduced, and the determination efficiency and accuracy of the source multimedia are improved.

Optionally, determining at least one associated multimedia corresponding to the multimedia segment according to the extracted entities and the preset knowledge graph, including:

and determining at least one multimedia entity according to the extracted triad information corresponding to each entity and the knowledge graph, and taking the multimedia data corresponding to each multimedia entity as associated multimedia.

In one embodiment of the above application, determining the multimedia entity through the extracted triplet information corresponding to each entity and the knowledge graph, expanding the extracted entity through the knowledge graph, and further mining the entity information corresponding to the multimedia segment; and eliminating non-multimedia entities in the extended entities, so that the number of the determined associated multimedia is further reduced, and the correlation degree between the associated multimedia and the source multimedia corresponding to the multimedia fragment is improved.

Optionally, after determining at least one multimedia entity, before taking the multimedia data corresponding to each multimedia entity as the associated multimedia, the method further comprises:

determining the confidence coefficient of each multimedia entity according to the determined frequency of each multimedia entity and/or the category of the multimedia associated information corresponding to each multimedia entity;

Screening multimedia entities with confidence meeting the set conditions from the multimedia entities;

correspondingly, taking the multimedia data corresponding to each multimedia entity as associated multimedia comprises the following steps:

and taking the multimedia data corresponding to each screened multimedia entity as associated multimedia.

In one embodiment of the above application, the confidence level of each multimedia entity is determined according to the determining frequency of each multimedia entity and/or the category of the file association information corresponding to each multimedia entity, and the multimedia entities are screened according to the determined confidence level, so that the number of multimedia entities is reduced, the number of the determined associated multimedia is further reduced, and the correlation between the associated multimedia and the source multimedia corresponding to the multimedia segment is indirectly improved.

Optionally, determining the confidence coefficient of each multimedia entity according to the determined frequency of each multimedia entity and the category of the multimedia association information corresponding to each multimedia entity includes:

weighting the determined frequency of the multimedia entities according to the confidence weights corresponding to the multimedia association information of different categories aiming at each multimedia entity;

and determining the confidence corresponding to the multimedia entity according to the weighted frequency of the multimedia entity.

According to one embodiment of the application, confidence coefficient determination is performed by refining the determination frequency of each multimedia entity and the category of the multimedia association information corresponding to each multimedia entity, a confidence coefficient determination mechanism is perfected, confidence coefficient determination is performed from the determination frequency and the two dimensions of the category of the multimedia association information, the association between the confidence coefficient and the multimedia fragment is improved, so that the higher the confidence coefficient of the multimedia entity is, the higher the probability that the multimedia data corresponding to the multimedia entity is the source multimedia corresponding to the multimedia fragment is, and the correlation between the determined associated multimedia and the multimedia fragment is indirectly improved.

Optionally, determining the source multimedia corresponding to the multimedia segment from each associated multimedia includes:

respectively determining the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia fragment;

and determining the source multimedia corresponding to the multimedia fragment from the associated multimedia according to the similarity.

In one embodiment of the application, the comparison of the associated multimedia and the multimedia fragments is performed through the determination of the similarity of the fingerprint information, so that the determination of the source multimedia corresponding to the multimedia fragments is performed from the associated multimedia, the determination mechanism of the source multimedia is perfected, and meanwhile, the accuracy of the determination result of the source multimedia is further improved.

Optionally, before determining the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia clip, the method further includes:

capturing multimedia data corresponding to the multimedia entity by utilizing a play link corresponding to the multimedia entity in the knowledge graph, and extracting fingerprint information of the multimedia data;

and storing the fingerprint information of the multimedia data and the multimedia entity in an associated mode to form a multimedia fingerprint library.

Optionally, determining the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia segment includes:

searching fingerprint information corresponding to each associated multimedia in the multimedia fingerprint library according to the multimedia entity of each associated multimedia;

extracting fingerprint information of the multimedia fragments, and respectively determining the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia fragments.

According to the embodiment of the application, the construction of the multimedia fingerprint library is performed in advance, so that the extraction operation of fingerprint information in the process of determining the source multimedia is avoided, the data operand in the process of determining the source multimedia is reduced, and the determination efficiency of the source multimedia is further improved.

In a second aspect, embodiments of the present application further provide a source multimedia determining apparatus, including:

the associated information acquisition module is used for acquiring at least one piece of multimedia associated information corresponding to the multimedia fragment; wherein the multimedia association information comprises at least one of descriptive text of the multimedia fragment, a media element identification result and user input tag information;

the associated multimedia determining module is used for extracting the entity of each piece of multimedia associated information and determining at least one associated multimedia corresponding to the multimedia segment according to each extracted entity and a preset knowledge graph;

and the source multimedia determining module is used for determining the source multimedia corresponding to the multimedia fragments from the associated multimedia.

In a third aspect, an embodiment of the present application further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a source multimedia determining method provided by an embodiment of the first aspect.

In a fourth aspect, embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a source multimedia determining method provided by the embodiments of the first aspect.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is a flowchart of a source multimedia determining method in a first embodiment of the present application;

fig. 2 is a flowchart of a source multimedia determining method in a second embodiment of the present application;

fig. 3 is a block diagram of a source multimedia determining apparatus in a third embodiment of the present application;

fig. 4 is a block diagram of an electronic device for implementing a source multimedia determining method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Example 1

Fig. 1 is a flowchart of a source multimedia determining method according to a first embodiment of the present application, where the method is applicable to determining a source of data of a multimedia segment included in a multimedia carrier such as an application or a web page, and the method is performed by a source multimedia determining device, which is implemented by software and/or hardware and is specifically configured in an electronic device having a certain data computing capability.

A method for determining source multimedia as shown in fig. 1, comprising:

s101, acquiring at least one piece of multimedia association information corresponding to a multimedia fragment; wherein the multimedia association information includes at least one of descriptive text of the multimedia clip, a media element recognition result, and user input tag information.

Illustratively, the multimedia clip may be a video clip; accordingly, the multimedia association information may be at least one of a description text of the video clip, an image recognition result of a media element included in the video, tag information corresponding to the video clip input by the user, and the like. Wherein the image of the media element can be a character image or a prop image, etc.

Illustratively, the multimedia clip may also be an audio clip; accordingly, the multimedia-related information may be at least one of descriptive text of the audio clip, a tone recognition result of the producer included in the audio, tag information corresponding to the sound clip input by the user, and the like.

Optionally, the multimedia association information of the multimedia segment may be stored in the local electronic device, other storage devices associated with the electronic device, or the cloud end in advance, and the multimedia association information may be acquired according to the identification information of the multimedia segment when needed.

Or alternatively, the multimedia related information of the multimedia clip may also be directly input by the user through the input means of the electronic device and determined according to the input information.

The exemplary description is given taking the example that the multimedia clip includes a video clip.

If the multimedia related information includes description text of the multimedia segment, the description text can be searched and obtained through the multimedia segment from the local electronic equipment, other storage equipment or cloud end locally related to the electronic equipment. Optionally, voice information input by the user can be obtained, and text information converted from the voice information is used as descriptive text. Alternatively, the picture information input by the user may be obtained, and the text information extracted from the picture information may be used as the descriptive text.

If the multimedia association information includes the media element identification result, the search and the acquisition of the media element identification result can be performed through the multimedia segment from the local electronic equipment, other storage equipment or cloud end locally associated with the electronic equipment. Optionally, the face image may be obtained by searching the multimedia segment from the local electronic device, other storage devices associated with the local electronic device, or the cloud, or directly receiving the face image input by the user; and determining the name of the actor or the name of the role corresponding to the face image through a face recognition technology, and taking the name of the actor or the name of the role as a media element recognition result. Optionally, the prop image can be searched and obtained through the multimedia segment from the local electronic equipment, other storage equipment locally associated with the electronic equipment or the cloud end, or the prop image input by the user can be directly received; and determining the prop name corresponding to the prop image through a mode identification technology, and taking the prop name as a media element identification result.

If the multimedia associated information comprises label information input by a user, directly acquiring text information manually input by the user as the label information; or, acquiring voice information input by a user, and taking the text information converted from the voice information as tag information; the picture information input by the user can be obtained, and the text information extracted from the picture information is used as tag information.

S102, extracting the entities of the multimedia associated information, and determining at least one associated multimedia corresponding to the multimedia segment according to the extracted entities and a preset knowledge graph.

Where an entity represents something that is distinct and independent, such as a word that can refer to a person, something, or an action.

The source multimedia is used for representing source data corresponding to the multimedia fragments. For example, for a video clip, the associated multimedia may be a source video corresponding to a movie or episode of a television or the like from which the video clip may be derived; for an audio clip, the associated multimedia may be the corresponding source audio of a song or movie theatrical soundtrack from which the audio clip may be derived.

For example, an entity database may be pre-constructed, word segmentation results obtained after word segmentation of each multimedia association information are searched and matched in the entity database, and if a word corresponding to the word segmentation result is searched, the searched word is used as the extracted entity.

The knowledge graph is used for describing various entities or concepts and relations thereof, and forms a huge semantic network graph, and the nodes represent the entities or concepts and the edges are formed by attributes or relations. Wherein, the triplet is a general identification mode of the knowledge graph, and the basic form of the triplet mainly comprises (entity 1-relation-entity 2) and (entity-attribute value) and the like. Each entity may be represented by a globally unique identity, each attribute-attribute value pair may be used to characterize an intrinsic property of the entity, and a relationship may be used to connect two entities, characterizing an association between them.

In an optional implementation manner of the embodiment of the present application, according to each extracted entity and a preset knowledge graph, determining an associated multimedia corresponding to the multimedia segment, which may be determining, according to triplet information corresponding to the preset knowledge graph, multimedia entities having an edge relationship with each extracted entity; and taking the multimedia data corresponding to each multimedia entity as associated multimedia.

For example, the extracted entity may be "entity 1", and the multimedia entity "entity 2" having an edge relationship with "entity 1" may be determined through the pre-stored triple information of "entity 1-relationship-entity 2" in the knowledge-graph. For example, "Hu Gaofeng" and "final emperor legend" are television show relations, and when the extracted entity is "Hu Gaofeng", the entity "final emperor legend" having television show relation with "Hu Gaofeng" can be determined as a multimedia entity.

The extracted entity may also be an "attribute value", and the multimedia entity "entity 3" having a variable relationship with the "attribute value" may be determined by the triple information of the "entity 3-attribute value" stored in advance in the knowledge graph. For example, "Hu Gaofeng" is an attribute value of the actor attribute of "final emperor" and "final emperor" is an entity of the actor attribute of "Hu Gaofeng" can be determined as a multimedia entity when the extracted entity is "Hu Gaofeng".

It can be understood that, in order to reduce the number of associated multimedia and ensure the association degree between the associated multimedia and the multimedia fragments, the multimedia entities corresponding to the associated multimedia may be screened to retain the multimedia entities with higher association degree with the multimedia fragments, and reject the multimedia entities with lower association degree.

For example, before the multimedia data corresponding to each multimedia entity is used as the associated multimedia, the relationship degree between the multimedia entity and the extracted entity can be determined according to the number of relationship edges between the multimedia entity and the extracted entity; and eliminating the multimedia entity with the relation degree larger than the set threshold value to update the multimedia entity. Wherein the set threshold is set by the skilled person as desired or as experienced or is determined by a number of experiments.

S103, determining source multimedia corresponding to the multimedia fragments from the associated multimedia.

Optionally, the data feature information of the multimedia data corresponding to the associated multimedia can be extracted, and the similarity between the data feature information corresponding to each associated multimedia and the data feature information corresponding to the multimedia fragment is used; determining the associated multimedia with the similarity larger than the set similarity threshold as the source multimedia corresponding to the multimedia segment; or determining the associated multimedia with the highest similarity threshold as the source multimedia corresponding to the multimedia segment. The similarity threshold may be set by a technician as needed or as an empirical value.

Illustratively, the data characteristic information may be fingerprint information, i.e.: respectively determining the similarity between the fingerprint information corresponding to each associated multimedia and the fingerprint information of the multimedia fragment; and determining the source multimedia corresponding to the multimedia fragment from the associated multimedia according to the similarity.

It can be understood that in order to reduce the data operand in the process of determining the source multimedia, the construction of the multimedia fingerprint database can be performed in advance, and when the fingerprint information corresponding to the multimedia data needs to be acquired, the multimedia database can be directly searched and acquired according to the multimedia entity of the multimedia data.

Specifically, before the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia segment is respectively determined, capturing multimedia data corresponding to a multimedia entity by using a play link corresponding to the multimedia entity in the knowledge graph, and extracting the fingerprint information of the multimedia data; and storing the fingerprint information of the multimedia data and the multimedia entity in an associated mode to form a multimedia fingerprint library.

Correspondingly, the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia segment is determined respectively, which may be: searching fingerprint information corresponding to each associated multimedia in the multimedia fingerprint library according to the multimedia entity of each associated multimedia; extracting fingerprint information of the multimedia fragments, and respectively determining the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia fragments. The fingerprint information may be extracted in various manners in the prior art, which is not described herein.

For example, when the multimedia-related information includes tag information input by a user: "Hu Gaofeng", "war sheet", "fire wind", "French" and "Shan Wei", recognition results corresponding to face images: "Shan Wei", "fire wind" and "Hu Gaofeng".

According to the extracted entities and the preset knowledge graph, determining the associated multimedia corresponding to the multimedia fragment as follows: "Country" series of TV episodes.

And determining the similarity of the fingerprint information of each set of videos of the television series 'final emperor legend' and the fingerprint information of the multimedia fragment, and determining the target source video as the X-th set of 'final emperor legend'.

According to the embodiment of the application, through acquiring the multimedia association information corresponding to the multimedia fragments and extracting the entities of the multimedia association information, the associated multimedia is screened according to the extracted entities and the preset knowledge graph, and then the source multimedia corresponding to the multimedia fragments is determined from the associated multimedia, so that the problems that the time consumption of the source multimedia determining process is long and the accuracy of the source multimedia determining is low are solved. According to the technical scheme, the related multimedia is determined by combining the entities corresponding to the multimedia related information under different dimensions and the preset knowledge graph, the determination range of the source multimedia is reduced, the source multimedia is determined from the related multimedia, the interference of non-real source multimedia such as entertainment news containing multimedia fragments is reduced, and the determination efficiency and accuracy of the source multimedia are improved.

Example two

Fig. 2 is a flowchart of a source multimedia determining method in a second embodiment of the present application, where the embodiments of the present application are optimized and improved based on the technical solutions of the foregoing embodiments.

Further, the operation of determining at least one associated multimedia corresponding to the multimedia segment according to the extracted entities and the preset knowledge graph is thinned into determining at least one multimedia entity according to the extracted triplet information corresponding to the entities and the knowledge graph, and taking multimedia data corresponding to the multimedia entities as associated multimedia, so that the number of the determined associated multimedia is reduced, and the correlation degree between the associated multimedia and source multimedia corresponding to the multimedia segment is improved.

A source multimedia determining method as shown in fig. 2, comprising:

s201, acquiring at least one piece of multimedia association information corresponding to a multimedia fragment; wherein the multimedia association information includes at least one of descriptive text of the multimedia clip, a media element recognition result, and user input tag information.

S202, extracting the entity of each multimedia association information, determining at least one multimedia entity according to the extracted entity and the triplet information corresponding to the knowledge graph, and taking the multimedia data corresponding to each multimedia entity as associated multimedia.

Determining an associated multimedia corresponding to the multimedia fragment according to the extracted entities and a preset knowledge graph, wherein the associated multimedia can be a multimedia entity with an edge relationship with the extracted entities respectively according to the triplet information corresponding to the preset knowledge graph; and taking the multimedia data corresponding to each multimedia entity as associated multimedia.

In an optional implementation manner of the embodiment of the present application, in order to reduce the number of associated multimedia and ensure the degree of association between the associated multimedia and the multimedia segments, after determining at least one multimedia entity, before taking the multimedia data corresponding to each multimedia entity as the associated multimedia, the confidence degree of each multimedia entity may be determined according to the determined frequency of each multimedia entity and/or the category of the multimedia associated information corresponding to each multimedia entity; and screening the multimedia entities with confidence degrees meeting the set conditions from the multimedia entities. Correspondingly, taking the multimedia data corresponding to each multimedia entity as associated multimedia comprises the following steps: and taking the multimedia data corresponding to each screened multimedia entity as associated multimedia.

It should be noted that, the confidence level reflects the association degree between the multimedia entity and the multimedia fragment from the side, and when the confidence level of the multimedia entity is higher, the association degree between the multimedia entity and the multimedia fragment is indicated to be greater.

Optionally, the determining the confidence level of each multimedia entity according to the determining frequency of each multimedia entity may be: counting the determined frequency of the multimedia entity determined according to the extracted entity; the determined frequency corresponding to each multimedia entity is used as the confidence level of the multimedia entity; or the determined frequency corresponding to each multimedia entity is converted into the confidence coefficient through monotonically increasing function operation.

Optionally, the determining the confidence coefficient of each multimedia entity according to the determining frequency of each multimedia entity and the category of the multimedia association information corresponding to each multimedia entity may be: weighting the determined frequency of the multimedia entities according to the confidence weights corresponding to the multimedia association information of different categories aiming at each multimedia entity; and determining the confidence corresponding to the multimedia entity according to the weighted frequency of the multimedia entity. The confidence weights corresponding to the multimedia association information of different categories can be set by technicians according to requirements or experience values. Typically, the emphasis point of the multimedia association information considering different categories is different: the description text can more comprehensively reflect the whole content of the multimedia fragment, the media element identification result can be associated with the multimedia fragment from the content detail, and the tag information is used for classifying the multimedia fragment from the attribute home layer of the multimedia information, so that the confidence weight corresponding to the description text is usually set to be the largest, and the confidence weight corresponding to the tag information is set to be the smallest.

The method includes the steps of selecting multimedia entities with confidence degrees meeting a set condition from the multimedia entities, wherein the selected multimedia entities have confidence degrees larger than a set confidence degree threshold; and/or sequencing the confidence coefficient of each multimedia entity, and selecting the multimedia entities with the highest confidence coefficient ranking and the set number. The confidence threshold may be set by the skilled person as desired or as an empirical value, or may be determined from a number of experiments.

S203, determining the source multimedia corresponding to the multimedia segment from the associated multimedia.

According to the embodiment of the application, the determination operation of the associated multimedia is refined, the multimedia entity is determined through the extracted three-tuple information corresponding to each entity and the knowledge graph, the extracted entity is expanded through the knowledge graph, and the entity information corresponding to the multimedia fragment is further mined; and eliminating non-multimedia entities in the extended entities, so that the number of the determined associated multimedia is further reduced, and the correlation degree between the associated multimedia and the source multimedia corresponding to the multimedia fragment is improved.

Example III

Fig. 3 is a block diagram of a source multimedia determining apparatus according to a third embodiment of the present application, where the embodiment of the present application is applicable to determining a data source of a multimedia segment included in a multimedia carrier such as an application program or a web page, and the apparatus is implemented by software and/or hardware, and is specifically configured in an electronic device having a certain data computing capability.

A source multimedia determining apparatus 300 as shown in fig. 3, comprising: an associated information acquisition module 301, an associated multimedia determination module 302 and a source multimedia determination module 303.

The associated information obtaining module 301 is configured to obtain at least one multimedia associated information corresponding to a multimedia segment; wherein the multimedia association information comprises at least one of descriptive text of the multimedia fragment, a media element identification result and user input tag information;

the associated multimedia determining module 302 is configured to perform entity extraction on each piece of multimedia associated information, and determine at least one associated multimedia corresponding to the multimedia segment according to each extracted entity and a preset knowledge graph;

a source multimedia determining module 303, configured to determine, from each of the associated multimedia, a source multimedia corresponding to the multimedia segment.

In one embodiment of the application, the associated information acquisition module acquires the multimedia associated information corresponding to the multimedia fragments, the associated multimedia determining module extracts the entity of the multimedia associated information, and the associated multimedia is screened according to the extracted entity and the preset knowledge graph, so that the source multimedia corresponding to the multimedia fragments is determined from the associated multimedia by the source multimedia determining module, and the problems that the time consumed in the process of determining the source multimedia is long and the accuracy of determining the source multimedia is low are solved. According to the technical scheme, the related multimedia is determined by combining the entities corresponding to the multimedia related information under different dimensions and the preset knowledge graph, the determination range of the source multimedia is reduced, the source multimedia is determined from the related multimedia, the interference of non-real source data such as entertainment news containing multimedia fragments is reduced, and the determination efficiency and accuracy of the source multimedia are improved.

Further, the associated multimedia determining module 302, when executing determining at least one associated multimedia corresponding to the multimedia segment according to the extracted entities and the preset knowledge-graph, includes:

and the associated multimedia determining unit is used for determining at least one multimedia entity according to the extracted triad information corresponding to each entity and the knowledge graph, and taking the multimedia data corresponding to each multimedia entity as associated multimedia.

Further, the associated multimedia determining module 302 further includes a multimedia entity screening unit, specifically configured to:

after determining at least one multimedia entity, before taking the multimedia data corresponding to each multimedia entity as associated multimedia, determining the confidence level of each multimedia entity according to the determination frequency of each multimedia entity and/or the category of the multimedia associated information corresponding to each multimedia entity;

correspondingly, the associated multimedia determining unit is specifically configured to, when executing the multimedia data corresponding to each multimedia entity as associated multimedia:

Further, the multimedia entity screening unit is specifically configured to, when executing the determination of the confidence level of each multimedia entity according to the determination frequency of each multimedia entity and the category of the multimedia association information corresponding to each multimedia entity:

Further, the source multimedia determining module 303 includes:

a similarity determining unit, configured to determine a similarity between fingerprint information of each associated multimedia and fingerprint information of the multimedia segment;

and the source multimedia determining unit is used for determining the source multimedia corresponding to the multimedia segment from the associated multimedia according to the similarity.

Further, the device also comprises a multimedia fingerprint database determining module for:

before the similarity between the fingerprint information of each associated multimedia and the fingerprint information of the multimedia segment is respectively determined, capturing multimedia data corresponding to the multimedia entity by utilizing a playing link corresponding to the multimedia entity in the knowledge graph, and extracting the fingerprint information of the multimedia data;

Further, the similarity determining unit is specifically configured to:

The source multimedia determining device can execute the source multimedia determining method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of executing the source multimedia determining method.

Example IV

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 4, a block diagram of an electronic device according to a source multimedia determining method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 4, the electronic device includes: one or more processors 401, memory 402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 401 is illustrated in fig. 4.

Memory 402 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the source multimedia determining method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the source multimedia determining method provided by the present application.

The memory 402 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the source multimedia determining method in the embodiments of the present application (e.g., the source multimedia determining apparatus 300 shown in fig. 3 and including the association information acquiring module 301, the association multimedia determining module 302, and the source multimedia determining module 303). The processor 401 executes various functional applications of the server and data processing, i.e., implements the source multimedia determining method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 402.

Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device performing the source multimedia determining method, and the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located with respect to processor 401, which may be connected via a network to an electronic device performing the source multimedia determining method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device performing the source multimedia determining method may further include: an input device 403 and an output device 404. The processor 401, memory 402, input device 403, and output device 404 may be connected by a bus or otherwise, for example in fig. 4.

The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device performing the source multimedia determining method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output device 404 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the multimedia associated information corresponding to the multimedia fragments is acquired, the entity of the multimedia associated information is extracted, the associated multimedia is screened according to the extracted entity and the preset knowledge graph, and then the source multimedia corresponding to the multimedia fragments is determined from the associated multimedia, so that the problems that the time consumption of the source multimedia determining process is long and the accuracy of the source multimedia determining is low are solved. According to the technical scheme, the entity corresponding to the multimedia association information is combined with the preset knowledge graph, so that the associated multimedia is determined, the determination range of the source multimedia is reduced, the source multimedia is determined from the associated multimedia, the interference of non-real source multimedia such as entertainment news containing multimedia fragments is reduced, and the determination efficiency and accuracy of the source multimedia are improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for determining source multimedia, comprising:

acquiring at least one piece of multimedia association information corresponding to the multimedia fragment; the multimedia associated information comprises at least one of descriptive text of the multimedia fragment, a media element identification result and user input label information, wherein the media element identification result comprises an image identification result or a tone identification result;

The multimedia associated information is determined through the local electronic equipment according to user input information, and the user input information is input through an input device of the local electronic equipment by a user;

when the multimedia segment comprises a video segment, if the multimedia association information comprises a description text of the multimedia segment, the user input information comprises voice information or picture information; if the multimedia association information comprises the media element identification result, the user input information comprises a face image or a prop image; if the multimedia associated information comprises user input label information, the user input information comprises text information, voice information or picture information;

extracting the entity of each multimedia associated information, and determining at least one multimedia entity according to the extracted entity and the triplet information corresponding to the preset knowledge graph;

counting the determined frequency of the multimedia entity determined according to the extracted entity; the determined frequency corresponding to each multimedia entity is used as the confidence level of the multimedia entity; or the determined frequency corresponding to each multimedia entity is converted into confidence coefficient through monotonically increasing function operation;

taking the multimedia data corresponding to each screened multimedia entity as associated multimedia;

determining source multimedia corresponding to the multimedia fragments from each associated multimedia;

wherein, the multimedia associated information of different categories has corresponding confidence weights respectively; the description text of the multimedia fragment can reflect the whole content of the multimedia fragment, the media element identification result can be associated with the multimedia fragment from the content detail, the tag information is used for classifying the multimedia fragment from the attribute home layer of the multimedia information, the confidence weight corresponding to the description text is set to be the largest, and the confidence weight corresponding to the tag information is set to be the smallest.

2. The method of claim 1, wherein determining the confidence level of each multimedia entity based on the determined frequency of each multimedia entity and the category of the multimedia association information corresponding to each multimedia entity comprises:

3. The method according to any one of claims 1-2, wherein determining a source multimedia corresponding to the multimedia segment from each of the associated multimedia comprises:

4. A method according to claim 3, wherein prior to determining the similarity of the fingerprint information of each associated multimedia to the fingerprint information of the multimedia clip, respectively, the method further comprises:

5. The method of claim 4, wherein determining the similarity of the fingerprint information of each associated multimedia to the fingerprint information of the multimedia clip, respectively, comprises:

6. A source multimedia determining apparatus, comprising:

the associated information acquisition module is used for acquiring at least one piece of multimedia associated information corresponding to the multimedia fragment; the multimedia associated information comprises at least one of descriptive text of the multimedia fragment, a media element identification result and user input label information, wherein the media element identification result comprises an image identification result or a tone identification result;

The associated multimedia determining unit is used for extracting the entities of the multimedia associated information and determining at least one multimedia entity according to the extracted entities and the triplet information corresponding to the preset knowledge graph;

a multimedia entity screening unit for determining the confidence coefficient of each multimedia entity according to the determined frequency of each multimedia entity and/or the category of the multimedia associated information corresponding to each multimedia entity;

the associated multimedia determining unit takes the multimedia data corresponding to each screened multimedia entity as associated multimedia;

a source multimedia determining module, configured to determine, from each of the associated multimedia, a source multimedia corresponding to the multimedia segment;

7. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a source multimedia determining method according to any one of claims 1-5.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a source multimedia determining method according to any one of claims 1-5.