WO2023169159A1 - 一种事理图谱建立方法及相关装置 - Google Patents

一种事理图谱建立方法及相关装置 Download PDF

Info

Publication number
WO2023169159A1
WO2023169159A1 PCT/CN2023/075917 CN2023075917W WO2023169159A1 WO 2023169159 A1 WO2023169159 A1 WO 2023169159A1 CN 2023075917 W CN2023075917 W CN 2023075917W WO 2023169159 A1 WO2023169159 A1 WO 2023169159A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
event
data
sets
events
Prior art date
Application number
PCT/CN2023/075917
Other languages
English (en)
French (fr)
Inventor
李忠阳
李明磊
郑毅
怀宝兴
袁晶
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210726908.5A external-priority patent/CN116775892A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023169159A1 publication Critical patent/WO2023169159A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Definitions

  • the present application relates to the field of affairs graphs, and specifically relates to a method for establishing an affairs graph and related devices.
  • the knowledge graph is built with the relationship between entities as the core. It has been widely used in artificial intelligence tasks such as search, recommendation, question and answer, etc.
  • the knowledge graph has certain representation limitations. It can only represent the relationship between entities, but cannot represent the relationship between events. In practical applications, after people learn about an event, they usually want to know more about the cause, development, results, lessons learned, etc. of the event, and even want to obtain related or similar events to the event. However, The existing knowledge graph cannot meet the needs of the market.
  • This application provides a method and related devices for establishing an event map.
  • the method described in this application is used to establish an event map.
  • the data on each node in the event map includes one or more of video, text, voice, and image.
  • the data on each node represents a type of event, and the edges between nodes represent the logical relationship between events.
  • the established event graph can be applied in a variety of scenarios, with a wide range of applications and strong applicability.
  • the present application provides a method for establishing an event map, which includes: acquiring first data, the first data including any one or more of video, image, text, and voice; and converting the first data to The data is divided into m sets.
  • the data of each set in the m sets represents a type of event.
  • the type of event includes at least one event, and m is any positive integer; determine the data in the m sets.
  • the logical relationship between the represented m-type events; each type of event in the m-type event is a node, and the logical relationship between the m-type events is the edge of the node, and an event graph is established.
  • this application provides a method for establishing a matter graph, which divides the first data into m sets, each set represents a type of event, and determines the logic between the m types of events represented by the data in the m sets. Relationship, each type of event in m types of events is a node, and the logical relationship between m types of events is the edge of the node, and a matter graph is established.
  • the data on each node of the event graph represents a type of event
  • the edges between the nodes represent the logical relationship between events, which represents the logical relationship between events and satisfies meet market demand.
  • the videos in the first data include any one or more of film and television program videos, news report videos, advertising videos, and video recordings.
  • the videos in the first data can include videos of various types and aspects.
  • the method described in this application has strong applicability and wide application range.
  • the data in the node includes any one or more of video, image, text, and voice.
  • the data on each node in the affairs graph includes any one or more of video, image, text, and voice.
  • the data on each node can include multiple types, and traditional knowledge Maps are built based on text , the data on each node in the knowledge graph only includes text.
  • the logical relationship includes one or more of a causal relationship and a temporal relationship.
  • determining logical relationships between m types of events represented by data in the m sets includes: determining the m events based on a trained logical relationship prediction model.
  • the data in the collection represents the logical relationship between m types of events.
  • the first data is at least one video and the type of event includes an event
  • the first data is divided into m sets, including:
  • the subtitle content or voice content carried in each video determines the video frame described as an event, and the video frame described as an event is divided into a set,
  • video frames containing a specified person's identity are divided into a set, or video frames containing multiple specified person's identities are divided into a set.
  • the first data is cut Divided into m sets, including:
  • the first data can be divided into n sets first, each set in the n sets represents an event, and then similar events among the n events represented by the n sets are fused to obtain m sets.
  • dividing the first data into n sets includes: for each video in the at least one video, according to (iv) to (vi) One or more of, segment the at least one video to obtain the n sets, where,
  • the subtitle content or voice content carried in each video is determined to be a video frame described as an event, and the video frame described as an event is divided into a set,
  • video frames containing one specified person's identity are segmented into a set, or video frames containing multiple specified person's identities are segmented into one set;
  • the fusion of similar events among the n events represented by the data of the n sets includes: according to any one or more of conditions (vii) to (x), fusing the n events among the n events fusion of similar events, where,
  • the method further includes: based on the user's operation, based on the established A map of affairs to recommend relevant content to users.
  • this application provides a device for establishing a matter map, including:
  • An acquisition module configured to acquire first data, which includes any one or more of video, image, text, and voice;
  • a segmentation module configured to segment the first data into m sets.
  • the data of each set in the m sets represents a type of event.
  • the type of event includes at least one event, and m is any positive event. integer;
  • a determination module used to determine the logical relationship between m types of events represented by the data in the m sets
  • An event graph creation module is used to establish an event graph with each type of event in the m-type events as a node, and the logical relationship between the m-type events as the edges of the nodes.
  • the videos in the first data include any one or more of film and television program videos, news report videos, advertising videos, and video recordings.
  • the data in the node includes any one or more of video, image, text, and voice.
  • the logical relationship includes one or more of a causal relationship and a temporal relationship.
  • the determination module is configured to: determine logical relationships between m types of events represented by data in the m sets according to a trained logical relationship prediction model.
  • the segmentation module is used to:
  • the subtitle content or voice content carried in each video determines the video frame described as an event, and the video frame described as an event is divided into a set,
  • video frames containing a specified person's identity are divided into a set, or video frames containing multiple specified person's identities are divided into a set.
  • the segmentation module is used to:
  • the first data is divided into n sets, and the data of each set in the n sets represents an event, where n is greater than or equal to m;
  • the device also includes a fusion module, the fusion module is used to, Similar events among the n events represented by the data of the n sets are fused to obtain the m sets.
  • the segmentation module is configured to: for each video in the at least one video, according to one or more of (iv) to (vi), Segment the at least one video to obtain the n sets, where,
  • the subtitle content or voice content carried in each video is determined to be a video frame described as an event, and the video frame described as an event is divided into a set,
  • video frames containing one specified person's identity are segmented into a set, or video frames containing multiple specified person's identities are segmented into one set;
  • the fusion module is used to fuse similar events among the n events according to any one or more of conditions (vii) to (x), where,
  • the device further includes: a recommendation module, configured to recommend relevant content to the user based on the user's operation and based on the established event map.
  • Each functional module in the second aspect is specifically used to implement the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computing device cluster, including at least one computing device.
  • Each computing device in the at least one computing device includes a memory and a processor.
  • the memory is used to store instructions.
  • the processor is used to store instructions. Execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method described in the above first aspect or any possible implementation of the first aspect.
  • the present application provides a computer-readable storage medium, including computer program instructions, which when the computer program instructions are run on a computing device cluster, cause the computer cluster to execute the above-mentioned first aspect or the first aspect. Any possible implementation method.
  • the present application provides a computer program product containing instructions.
  • the computer cluster causes the computer cluster to execute the above-mentioned first aspect or any possible implementation of the first aspect. method described.
  • Figure 1 is a schematic flow chart of a method for establishing a matter map provided by this application
  • Figure 2 is a schematic diagram of a scenario provided by this application.
  • Figure 3 is a schematic diagram of a directed and cyclic event map provided by this application.
  • Figure 4 is a schematic diagram of a directed acyclic event map provided by this application.
  • Figure 5 is a schematic flow chart of a method for establishing a matter map provided by this application.
  • Figure 6 is a schematic diagram of the training prediction structure of a logical relationship prediction model provided by this application.
  • Figure 7 is a schematic structural diagram of a device for establishing an event map provided by this application.
  • Figure 8 is a schematic structural diagram of a computing device provided by this application.
  • Figure 9 is a schematic structural diagram of a computing device cluster provided by this application.
  • Figure 10 is a schematic structural diagram of yet another computing device provided by this application.
  • the event graph refers to a graph built with events as nodes and the relationships between events as edges.
  • Figure 1 is a schematic flow chart of a method for establishing an event map provided by this application. The method includes but is not limited to the following description.
  • the first data includes any one or more of video, image, text, and voice.
  • the first data may include video, and the video may include any one or more of film and television program videos, news report videos, advertising videos, and video recordings.
  • the film and television program video may include, for example, one or more of a certain TV series clip, a certain movie, a certain movie clip, a certain variety show, a certain variety show clip, etc., wherein a certain TV series clip, a certain A movie, a certain movie clip, a certain variety show, a certain variety show clip, etc. can be presented in the form of animation, and of course can also be presented in the form of people, objects, etc. This application is not limited.
  • news report videos can also include news report videos.
  • news report videos refer to broadcasting news in the form of videos.
  • News report videos can be played on TV or on terminal devices through the Internet.
  • the terminal devices can It is a mobile phone, desktop computer, notebook, tablet, display screen, electronic watch or other electronic device, etc.
  • This application does not limit the playback carrier of the news report video.
  • the news report video can report on recent events, or it can be This application is not limited to reporting on historical events that people are more concerned about.
  • News reporting videos can be about any aspect, for example, they can be military, political, historical, or It can be related to finance or daily life, etc. This application is not limited.
  • Advertising videos can also include advertising videos.
  • Advertising videos here refer to advertisements in the form of videos. Advertising videos can be about any aspect. They can be about certain physical objects. For example, they can be about certain daily necessities in people's daily lives.
  • An advertising video can also promote the application of a certain development software, or promote values or life concepts, etc. This application does not limit the specific content involved in the advertising video.
  • the advertising video can come from television.
  • the advertising videos presented on the Internet can also come from advertising videos presented on terminal devices through the network. This application does not limit the source of advertising videos.
  • Recorded videos may include videos captured by cameras, cameras, infrared sensors and other devices through photography or video recording.
  • video recording may include video captured or recorded by a camera or a terminal device to capture or record environmental information, where the environmental information may include people, objects, scenery, etc.
  • the terminal device may be, for example, a mobile phone, a computer, or even a computer. Other electronic devices with camera or recording functions, etc. This application does not specifically limit the content of the camera.
  • the recorded video can be obtained by taking or recording the front camera, or it can be obtained by taking or recording the rear camera, which is not limited in this application.
  • the recorded video may also include video captured by the driving recorder on the vehicle.
  • Recorded videos may also include surveillance videos.
  • Surveillance videos refer to videos obtained by surveillance equipment, such as cameras.
  • the recorded video can be a video recorded by the user through a camera or recording device.
  • the recorded video can be a video about any content.
  • the recorded video can be a video about food, or a video about food. It can be a video about beauty, it can be a video about travel, it can also be a video about entertainment and comedy, etc.
  • the recorded video may be a video about a game.
  • the recording function on the terminal device is turned on, and all the processes of the game from the user's perspective are recorded; another example is When a user plays a game on one terminal device, another terminal device is used to record the entire process of the user playing the game.
  • This process can be broadcast in real time through a live network broadcast, or it can also be broadcast in a non-real time manner.
  • the recorded video can also include a phone video.
  • a phone video can be a video recorded from the perspective of multiple people when they communicate through mobile phones or computers or other electronic devices.
  • the communication can be through mobile phones or computers or other electronic devices. It can be carried out using specific software on other electronic devices.
  • the specific software can be, for example, social networking software or conferencing software for meetings. It can also be done through a subscriber identity module (SIM). It is carried out through other methods, which is not limited by this application. This application does not specifically limit the content involved in the phone video.
  • SIM subscriber identity module
  • the videos in the first data can also include some other forms of videos, such as videos of comments and opinions on a certain news on online platforms; videos analyzing the causes and consequences of an event and the impact of the event; Video analysis and commentary of film and television program videos; videos analyzing financial markets, financial trends, etc.; etc. This application does not specifically limit the source of the video.
  • the first data may also include images, and the images may be images related to film and television programs. For example, they may be any of a certain TV series clip, a certain movie, a certain movie clip, a certain variety show, or a certain variety show clip.
  • the image may also be an image related to a news report, for example, it may be any one video frame or any multiple video frames in the above news report video.
  • the image may also be an image related to an advertisement, for example, it may be any one video frame or any plurality of video frames in the above advertisement video.
  • the image may also be an image obtained by collecting environmental information by an image acquisition device. The environmental information includes people, objects, scenery, etc. This application does not specifically limit the content of the image, and does not specifically limit the acquisition form of the image.
  • the first data may also include text.
  • This application does not specifically limit the content and source of the text.
  • the text may be part or all of a paper, journal, magazine, article, newspaper, etc. in any direction. It can also be news text.
  • News text refers to the recording or dissemination of news through journals, magazines, newspapers, etc. It can also be lines in film and television programs, drama lines, drama lines, advertising slogans, etc.
  • the first data may also include speech.
  • This application does not specifically limit the expression form of the speech.
  • the speech may be expressed in Mandarin, a foreign language, or a dialect.
  • This application does not specifically limit the content of the voice.
  • the content of the voice can be a news broadcast. News broadcast refers to broadcasting news through a radio station or other forms. It can also be a conversation between multiple people.
  • the content of the voice It can also be lyrics, lines from film and television programs, drama lines, drama lines, advertising slogans, etc.
  • the first data may also include log data.
  • the log data may be, for example, the user's activity track data recorded in a smart watch, or the user's heart rate, step count, and other data recorded in a wearable device.
  • This application provides Log data is not specifically limited.
  • the first data can be preprocessed.
  • the images can be denoised, image enhanced, etc.
  • the first data includes surveillance videos
  • invalid frames in the surveillance videos can be deleted.
  • the invalid frames can be, for example, When no human video frames appear in the monitoring area and the first data includes speech, speech enhancement technology can be used to remove noise contained in the speech, improve speech quality, etc.
  • This application does not specifically limit the preprocessing operation.
  • each video in the at least one video is segmented separately to obtain a total of m sets, and the data in each set of the m sets represents an event.
  • the data in each set of the m sets represents an event.
  • video frame 1 and video frame 2 are divided into different sets. For example, video frame 1 is divided into set 1 and video frame 2 is divided into set 2; then, determine Whether video frame 2 and video frame 3 describe an event, if so, split into one set, if not, split into different sets, for example, video frame 2 is split into set 2, video frame 3 is split into into set 3;...In this way, video frame 1, video frame 2, video frame 3...video frame k in the video are divided into one or more sets.
  • the method of determining whether the adjacent video frames describe an event based on the picture similarity between adjacent video frames may be: if the picture similarity between adjacent video frames is greater than or equal to the first threshold, then determine whether the adjacent video frames describe an event. The frame describes an event. If the picture similarity between adjacent video frames is less than the first threshold, it is determined that the adjacent video frames do not describe an event.
  • the picture similarity between video frame 1 and video frame 2 is greater than the first threshold, and the picture similarity between video frame 2 and video frame 3 is greater than the first threshold, then determine video frame 1, video frame 2 and video Frame 3 describes an event; if the picture similarity between video frame 1 and video frame 2 is greater than the first threshold, and the picture similarity between video frame 2 and video frame 3 is less than the first threshold, then video frame 1 Video frame 2 describes an event, and video frame 3 does not describe the same event as video frame 1 and video frame 2; if the picture similarity between video frame 1 and video frame 2 is less than the first threshold, video frame 2 If the similarity between video frame 3 and video frame 3 is greater than the first threshold, it is determined that video frame 2 and video frame 3 describe an event, and video frame 1 describes a different event from video frame 2 and video frame 3; etc. .
  • the first threshold can be specifically set according to specific circumstances, and is not limited in this application.
  • the video if the video carries subtitles, it can be determined according to the subtitle content carried in the video which video frames describe an event, and the video frames described as an event are divided into a set, describing Split video frames for different events into different sets.
  • subtitles are essentially text, and natural language processing technology can be used to determine which video frames in the video describe an event based on the subtitle content.
  • the video carries voice
  • each video in at least one video can be segmented according to the identity of the characters in each video.
  • video frames containing a specified person's identity can be divided into a set; video frames containing multiple specified person's identities can also be divided into a set.
  • One set represents an event, and the specified person's identity can be It is a designated person, or designated certain people, or designated certain things, or designated certain things, etc.
  • the identity of a designated character can be the name of a designated character in a film or television program, or he can be the actor of a designated role in a film or television program, or he can be a designated reporter or designated host in a news report, etc. This application does not identify the designated character. Identity is not limited.
  • a news report video includes two hosts.
  • the two hosts are host A and host B.
  • the video frames containing host A can be divided into a set. Split the video frames containing host B into another set.
  • the video includes multiple episodes of a TV series, and the multiple episodes of the TV series include character A, character B, character C, character D, character E, and other characters.
  • the video can contain character A.
  • the video frames containing character B, character C, character D, character E and other characters are segmented into another set; in another example, the video frames containing character A can be segmented into one set.
  • the video frames of character B and character B are divided into one set, and the video frames containing character C, character D, character E and other characters are divided into another set.
  • the feature extraction method can be used to extract the features of the specified person's identity, and use the extracted features to segment the video frames containing the specified person's identity into a set.
  • This application explains how to divide the video frames containing the specified person's identity into a set. There is no restriction on splitting into a collection.
  • each video in the at least one video can be segmented according to any one of the above possible implementation methods, or each video in the at least one video can be segmented according to any combination of the above possible implementation methods. .
  • any of the possible implementation methods or a combination of multiple possible implementation methods introduced above can be implemented through a model.
  • a trained model is obtained, and at least one video input is trained.
  • m sets are obtained, where one set among the m sets represents an event.
  • the first data is a lot of images
  • divide the many images into m sets where one of the m sets
  • the set represents an event, and m is any positive integer.
  • the images are classified, for example, the images are divided into landscape images, people images, object images, etc.
  • each type of image is divided into multiple collections.
  • human images human images can be divided into multiple collections based on the person's identity.
  • images of the same character in a movie or TV series can be divided into multiple collections.
  • Divide into a set or divide the images from all the film and television programs involved in an actor into a set; for example, divide the images related to fire or ignition into a set, and divide the images related to rear-end collisions into a set.
  • the images are divided according to the videos.
  • the video can be segmented first and divided into multiple sets.
  • the data in each set represents an event; for each image, the relationship between the image and multiple sets is calculated.
  • the similarity between any video frame or any multiple video frames in any set If the similarity is greater than or equal to the first threshold, then the image is divided into the set. If the similarity is less than the first threshold, Then calculate the similarity between the image and any video frame or multiple video frames in another set. If the similarity is greater than or equal to the first threshold, then divide the image into this set, otherwise calculate the image again.
  • the similarity with any video frame or any multiple video frames in another set are the similarity between any video frame or any multiple video frames in another set.
  • the image is divided into a separate collection.
  • the similarity can be calculated between all the images in the independent sets. If the similarity is greater than or equal to the first threshold, then the similarity is greater than or equal to the first threshold. (Images) are merged into a set, and those with similarity less than the first threshold remain unchanged.
  • the video is first segmented into multiple sets; then the identity of the person included in the image is determined, and the image is divided into sets including the identity of the person.
  • videos and images are processed separately, that is, the video is divided into multiple sets, and then the images are divided into multiple other sets, and any video frame or any number of video frames in the set containing the video are calculated respectively. Similarity between a video frame and any image or images in a collection containing images, merging a collection containing videos with a collection containing images based on similarity, etc.
  • natural language processing technology can be used to determine which texts describe an event based on the text content, and segment the text describing an event into a collection, thereby segmenting all texts into in m sets.
  • the video and text can be processed separately, the video is divided into multiple sets, and the text is divided into multiple other sets; from the multiple sets containing the video Select any one set from the set, select any set from multiple sets containing text, and calculate the similarity between the subtitle content or voice content carried in the set containing the video and the text in the set containing the text. If the similarity is greater than or equal to the second threshold, then the two sets are merged, otherwise they are not merged; then any set is selected from multiple sets containing videos, and any set is selected from multiple sets containing text, and the same calculation is performed ... until all sets and all possible combinations are traversed.
  • the first data is speech
  • natural language processing technology can be used to determine which speech describes an event based on the speech content, and segment the speech that describes an event into a set, thereby segmenting all speech into in m sets.
  • the video and voice can be processed separately, and the video and voice can be processed separately.
  • the method of dividing the first data into m sets is only used as an example and does not constitute a limitation. This application describes the specific divisions. The method is not limited.
  • the data in each set includes one or more of video, image, text, and voice.
  • the first data includes multiple videos, multiple images, multiple texts, and multiple speech segments.
  • the first data is segmented, that is, the multiple videos are divided into Segment each video, segment each video into one or more smaller videos, segment multiple images, divide each image into different collections, segment each of multiple texts Segment the text, divide the segmented text into different sets, segment each segment of speech in the multi-segment speech, and divide the segmented smaller segments of speech into different sets.
  • the data includes one or more of video, image, text, and voice.
  • Figure 2 is only used for explanation and does not constitute any limitation on the present application.
  • the data in each collection represents an event. First determine the title of each event, and determine the logical relationship between the m events based on the title of each event.
  • the following describes how to determine the title of each event, and how to determine the logical relationship between events based on the titles of each time.
  • a collection includes videos, and text is used to describe the content of each video frame in the video. This step is similar to using text to describe the content of each image, and then based on the text description of each video frame, an algorithm is used Calculations are performed to determine the title of this video, that is, the title of the event represented by this collection.
  • using text to describe the content of each video frame or the content of each image can be implemented through a model.
  • the model can be trained based on a large number of video frames or a large number of images and labels, and the labels include each video frame or each image. text description. Input the video into the trained model to obtain the text description of each video frame.
  • the model can be implemented through convolutional neural network, recurrent neural network, etc. This application does not limit the specific implementation method and training method of the model. For the case where the collection includes multiple images, this method can also be used to determine the title of the event represented by the multiple images.
  • a collection includes videos, and the videos carry subtitles or voices. Based on the subtitle content or voice content, natural language processing technology can be used to determine the title of each video, that is, the title of the event represented by the collection. title. For cases where the collection includes text or speech, the same method can be used to determine the title of the event represented by the text or the title of the event represented by the speech.
  • the logical relationship prediction model can be trained based on the titles of a large number of events. Alternatively, it can also be obtained based on a large number of events and labels.
  • the labels include logical relationships between events. This application does not limit the training method of the logical relationship prediction model. .
  • determine the title of each event and determine the logical relationship between the m events based on the title of each event, which can be implemented through a model. That is, the data in each set are input into the model, where the data in each set includes one or more of video, image, text, voice, etc., and the model output is the logical relationship between each set, that is, The logical relationship between various events.
  • causal relationships include one or more of causal relationships and temporal relationships.
  • causality or causation refers to the relationship between cause and effect.
  • the causal relationship between events refers to the fact that one or more events lead to another or multiple events, where the previous one or more events are called cause events, the latter one or more events are called result events, and the cause events lead to the results. event.
  • an event may be caused by one cause or multiple causes, that is, a result event may correspond to a cause event or multiple cause events; a cause may lead to a result, It may also lead to multiple results, that is, a cause event may correspond to a result event, and may correspond to multiple result events.
  • This application does not limit the correspondence between cause events and result events. For example, event 1 is a car accident, and event 2 is dialing 110. Then there is a causal relationship between event 1 and event 2, where event 1 is the cause and event 2 is the effect. Because "a car accident happened", "call 110".
  • the temporal relationship refers to the sequence relationship of time.
  • the temporal relationship between events refers to the fact that multiple events only occur at different times, and there is no obvious causal relationship between events.
  • event 1 is washing vegetables
  • event 2 is chopping vegetables
  • event 3 is cooking rice.
  • event 1 is washing vegetables
  • event 2 is chopping vegetables
  • event 3 is cooking rice.
  • event 1 is washing vegetables
  • event 2 is chopping vegetables
  • event 3 is cooking rice.
  • event 1 is washing vegetables
  • event 2 is chopping vegetables
  • event 3 is cooking rice.
  • event 1 is washing vegetables
  • event 2 is chopping vegetables
  • event 3 is cooking rice.
  • event 1 is washing vegetables
  • event 2 is chopping vegetables
  • event 3 is cooking rice.
  • logical relationships can also include other relationships, such as concession relationships, turning relationships, etc.
  • the logical relationships between events can be set by users according to specific situations and specific needs. For example, when users train logical relationship prediction models , you can set the logical relationships between events in the sample, and how to define the logical relationships, etc., so as to predict the logical relationships between events based on the trained logical relationship prediction model, which is not limited by this application.
  • each event as a node and the logical relationship between each event as the edge of the node, establish an event graph, in which the data on each node is the data in each collection, including video, image, text, voice, etc.
  • the edges between each event are directed edges, and the direction of the edge represents the logical relationship between events, from the cause event to the result event, or from the event that occurs first in time to the event that occurs later. (It’s just that the time of occurrence is different. There is an order of occurrence first and occurrence later in time. There is no causal relationship).
  • the established event graph can be a directed cyclic graph or a directed acyclic graph.
  • the first data includes data of multiple people or data of multiple things. Multiple events represent events of multiple people or multiple things, that is to say, events occur to multiple people or multiple things. , but the time of occurrence is different.
  • the event map established according to the order of occurrence of time may also form a ring.
  • Figure 3 is a schematic diagram of a directed and annular event map provided by the present application.
  • "having a car accident” and “calling 120" and “calling 110” are causal relationships, because “A car accident occurred”, so “dial 120” and “dial 110”;
  • "A car accident occurred” and “vehicle explosion” are causally related, “car accident occurred” led to “vehicle explosion”;
  • “vehicle explosion” is related to "dial 120”
  • "Dial 110” is also a causal relationship, because “the vehicle exploded”, so “Dial 120", “Dial 110”;
  • “Dial 120” and “Doctors treat the injured” are causal relationships, because "Dial 120", so There was the incident of "doctors tending to the wounded”.
  • Figure 4 is a schematic diagram of a directed acyclic event graph provided by the present application.
  • the event graph is established based on the order in which events occur. All events in the event graph are temporal relationships. Events There is no causal relationship between them. For example, the events of "taking the bus” and “eating hot pot” just happened at different times. Similarly, there is no causal relationship between "watching a movie” and “eating popcorn” or “drinking Coke”. There is no cause-and-effect relationship, and there is no cause-and-effect relationship between "drinking Coke” and “playing at the beach” Tie.
  • the established event map can be applied to any terminal device.
  • the terminal device can be a mobile phone, a desktop computer, a notebook, a tablet, a wearable device, etc.
  • the established affairs map is installed on the terminal device.
  • the terminal device provides the user with a query function.
  • the user can query the target content by inputting keywords.
  • the terminal device can also query according to the target content.
  • the content recommends content related to the target content to the user. For example, if the target content queried by the user is a cause event, the terminal device can recommend the result event to the user based on the event map. If the target content queried by the user is a result event, the terminal device can recommend the result event to the user based on the event map.
  • the graph recommends cause events to the user, or the terminal device recommends events that have a temporal relationship with the target content to the user, and so on.
  • the established affairs graph can be used in monitoring equipment.
  • One or more alarm events are set in the affairs graph.
  • One or more alarm events can be located on one node of the affairs graph or on multiple nodes.
  • Each alarm event can be located on one node of the affairs graph or on multiple nodes.
  • the data corresponding to each alarm event includes one or more of video and image.
  • the monitoring equipment monitors the monitoring area, it calculates the similarity between the acquired video frames of the monitoring area and the video frames or images in each alarm event. If the similarity is greater than or equal to the first threshold, the monitoring area is determined.
  • the monitoring equipment triggers an alarm operation.
  • the alarm operation can be, for example, the monitoring equipment making a whistle sound, or the monitoring equipment sending prompt information to relevant personnel.
  • the prompt information is used to prompt that an alarm event has occurred in the monitoring area, etc. . If the similarity is less than the first threshold, no processing is performed.
  • Figure 5 is a schematic flowchart of yet another method for establishing a matter map provided by this application. The method includes but is not limited to the following description.
  • the first data includes any one or more of video, image, text, and voice.
  • steps S201 and S202 reference may be made to the description of relevant contents in steps S101 and S102 in the method embodiment of Figure 1 respectively. For the sake of brevity of the description, they will not be described again here.
  • Fusion of n sets yields m sets.
  • Each set in the m sets represents a type of event.
  • One type of event may include one event or multiple events.
  • One type of event may include multiple events. Refers to are multiple events obtained by the fusion of two or more events among n events, where n is greater than or equal to m.
  • the data in each collection of n collections represents an event.
  • the title of each event is first determined, and then the events are fused based on the similarity of the titles of each event.
  • the similarity of the titles can be understood as whether the number of identical texts contained in the titles of the two events is greater than or equal to the third threshold. If it is greater than or equal to the third threshold, fusion is performed, otherwise no fusion is performed.
  • the method of determining the title reference may be made to the description of determining the title-related content of each event in step S103 of the method embodiment in Figure 1 .
  • multiple keywords are used to describe the events represented by the data in each collection, and n events are fused according to the number of the same keywords contained in each event. For example, when the number of two events containing the same keywords is greater than or equal to the third threshold, the two events can be merged, that is, the sets corresponding to the two events are merged. If the number of the same keywords contained in the two events is less than the third threshold, In the case of , there will be no merging, and the third threshold can be set according to the specific situation.
  • the method of determining the keywords of the events represented by the data in each collection is similar to the method of determining the title of each event in step S103 of the method embodiment in Figure 1.
  • For videos/images use text to describe the video.
  • Each video frame/image in content and then use algorithms to calculate based on the text description of each video frame/image to determine the keywords involved in the video/image; for videos, texts, and voices that include subtitles or voices, natural language processing technology is used to Identify keywords.
  • the keywords of the events represented by the data in each collection can also be determined through other methods, which are not limited by this application.
  • n events can be fused based on the similarity of subtitle content or voice content carried in the videos of each collection. For example, calculate the similarity between the subtitle content carried by the video in set 1 and the subtitle content carried by the video in set 2. If the similarity is greater than or equal to the second threshold, then merge set 1 and set 2. If the similarity is less than the second threshold, threshold, no merging will occur.
  • the two sets can be merged according to whether the videos in each collection contain the same video frame. If the two collections contain the same video frame, the two collections can be merged. , then they are not merged.
  • the two collections can be merged.
  • the video of collection 1 contains the video frame of character A
  • the video of collection 2 contains also contains the video frames of character A
  • the two sets can be merged; for example, if the set 1 video contains the video frames of the narrator B, and the set 2 video also contains the video frames of the narrator B, then the two sets can merge. If the videos in the two collections do not contain the same people or objects, they will not be merged.
  • FIG. 6 is a schematic diagram of the training and prediction structure of a logical relationship prediction model provided by an embodiment of the present application.
  • the data acquisition device 560 is used to acquire training data.
  • the training data may include one or more of video, image, text, voice, etc. used to represent the event.
  • the training data may also include tags. In the tags, Including logical relationships between events, including causal relationships, timing relationships, etc.
  • the data acquisition device 560 After acquiring the training data, the data acquisition device 560 stores the training data in the database 530, and the database 530 can maintain the training data.
  • the training device 520 can perform training based on the training data in the database 530 to obtain the trained logical relationship prediction model 513, and transplant the trained logical relationship prediction model 513 to the execution device 510.
  • the training device 520 may exist independently of the execution device 510, or may be integrated inside the execution device 510.
  • the user can input the data that needs to be predicted through the input and output I/O interface 512 of the execution device 510, for example, m sets of data, where the data in each set represents a type of event, or the user can also use the data acquisition device 560 to
  • the data of m sets are input into the database 530, and then the execution device 510 obtains the data of the m sets from the database 530.
  • the logical relationship prediction model 513 performs logical relationship prediction on the input m sets and determines the logical relationship between each set. , and output the logical relationship between each set through the input and output I/O interface 512.
  • the training data maintained in the database 530 may not necessarily come from the data acquisition device 560, but may also be obtained from other devices.
  • the training device 520 does not necessarily train the logical relationship prediction model 513 entirely based on the training data maintained by the database 530. It may also be based on the training data maintained by the database 530. Other devices obtain training data for model training. The above description should not be used as a limitation on the embodiments of the present application.
  • the logical relationship prediction model 513 can be applied in the method embodiment shown in Figure 1 or Figure 5 of this application.
  • the execution device 510 processes input data, or when the calculation module 511 of the execution device 510 performs calculations and other related processes, the execution device 510 can call data, codes, etc. in the data storage system 550 for corresponding processing, The data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 550.
  • the training device 520 can generate corresponding logical relationship prediction models 513 based on different training data for different goals.
  • the corresponding logical relationship prediction models 513 can be used to achieve the above goals, thereby providing users with the required result.
  • Figure 7 is a schematic structural diagram of a device 700 for establishing a matter map provided by this application.
  • the device 700 includes:
  • the acquisition module 701 is used to acquire first data, which includes any one or more of video, image, text, and voice;
  • the segmentation module 702 is used to segment the first data into m sets.
  • the data of each set in the m sets represents a type of event.
  • a type of event includes at least one event, and m is any positive integer;
  • the determination module 703 is used to determine the logical relationship between m types of events represented by the data in the m sets;
  • the event graph establishment module 704 is used to establish an event graph using each type of event in the m types of events as a node and the logical relationships between the m types of events as edges of the nodes.
  • the videos in the first data include any one or more of film and television program videos, news report videos, advertising videos, and video recordings.
  • the data in the node includes any one or more of video, image, text, and voice.
  • logical relationships include one or more of causal relationships and temporal relationships.
  • the determination module 703 is configured to determine logical relationships between m types of events represented by data in the m sets based on the trained logical relationship prediction model.
  • the segmentation module 702 is used to:
  • each video in the at least one video segment the at least one video to obtain m sets according to one or more of (i) to (iii), where,
  • the subtitle content or voice content carried in each video determines the video frame described as an event, and the video frames described as an event are divided into a set,
  • video frames containing a specified person's identity are segmented into a set, or video frames containing multiple specified person's identities are segmented into a set.
  • the segmentation module 702 is used to segment the first data into n sets, Each set of data in the n sets represents an event, where n is greater than or equal to m; the device also includes a fusion module 705, which is used to compare similar events among the n events represented by the data in the n sets. Perform fusion to obtain m sets.
  • the segmentation module 702 is used to:
  • video frames containing one specified person's identity are segmented into a set, or video frames containing multiple specified person's identities are segmented into one set;
  • the fusion module 705 is used to: fuse similar events among the n events according to any one or more of the conditions (vii) to (x), where,
  • the device 700 also includes: a recommendation module 706, configured to recommend relevant content to the user based on the user's operation and the established event map.
  • the acquisition module 701, the segmentation module 702, the determination module 703, the event map establishment module 704, the fusion module 705 and the recommendation module 706 can all be implemented by software, or can be implemented by hardware.
  • the following takes the segmentation module 702 as an example to introduce the implementation of the segmentation module 702 .
  • the implementation of the acquisition module 701, the determination module 703, the event map creation module 704, the fusion module 705 and the recommendation module 706 can refer to the implementation of the segmentation module 702.
  • the segmentation module 702 may include code running on a computing device.
  • the computing device may be a computing device in a cloud service, where the computing device may be, for example, a bare metal server or a virtual machine.
  • the computing device may be one or more.
  • sharding module 702 may include code that runs on multiple computing devices. It should be noted that multiple computing devices used to run the code can be distributed in the same region (region) or in different regions. Further, multiple computing devices used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes a data center or multiple geographically close locations. of data centers. Among them, usually a region can include multiple availability zones AZ.
  • VPC virtual private cloud
  • multiple computing devices used to run the code can be distributed in the same virtual private cloud (VPC) or across multiple VPCs.
  • VPC virtual private cloud
  • Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
  • the segmentation module 702 may include at least one computing device, such as a server, computer, mobile phone, etc.
  • the A module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the segmentation module 702 may be distributed in the same region or in different regions. Multiple computing devices included in the segmentation module 702 may be distributed in the same AZ or in different AZ. Similarly, multiple computing devices included in the sharding module 702 may be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, virtual machines, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the segmentation module 702 can be used to perform any steps in a method of establishing a transaction map
  • the acquisition module 701, the determination module 703, the transaction graph creation module 704, the fusion module 705 and the recommendation module. 706 can be used to perform any steps in a method of establishing a transaction graph.
  • the steps responsible for the acquisition module 701, the segmentation module 702, the determination module 703, the transaction graph creation module 704, the fusion module 705 and the recommendation module 706 can be implemented as needed.
  • the device 700 for establishing an event map is implemented by implementing the different steps in a method for establishing an event map through the acquisition module 701, the segmentation module 702, the determination module 703, the event map creation module 704, the fusion module 705 and the recommendation module 706 respectively. all functions.
  • FIG 8 is a schematic structural diagram of a computing device 800 provided by the present application.
  • the computing device 800 is, for example, a bare metal server or a virtual machine.
  • the computing device 800 can be configured as a device for establishing a transaction graph.
  • the device for establishing a transaction graph can be For mobile phones, computers, tablets, and servers, the computing device 800 includes: a bus 802, a processor 804, a memory 806, and a communication interface 808.
  • the processor 804, the memory 806 and the communication interface 808 communicate through the bus 802.
  • Computing device 800 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 800.
  • the bus 802 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • Bus 802 may include a path that carries information between various components of computing device 800 (eg, memory 806, processor 804, communications interface 808).
  • the processor 804 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 806 may include volatile memory, such as random access memory (RAM).
  • the processor 804 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • solid state drive solid state drive
  • the memory 806 stores executable program code
  • the processor 804 executes the executable program code to respectively implement the aforementioned acquisition module 701, segmentation module 702, determination module 703, affairs map creation module 704, fusion module 705 and recommendation module. 706 function, thereby realizing a method of establishing a political map. That is, the memory 806 stores instructions for executing a method of establishing a matter map.
  • the communication interface 808 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 800 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a mobile phone, a computer, a notebook, a tablet, or a server.
  • the server may be, for example, a central server, an edge server, or a local server in a local data center.
  • Figure 9 is a schematic structural diagram of a computing device cluster provided by this application.
  • the computing device cluster includes at least one computing device 800.
  • memory 806 of one or more computing devices 800 in a cluster of computing devices There may be identical instructions for executing a method of establishing a theory map.
  • the memory 806 of one or more computing devices 800 in the computing device cluster may also store part of instructions for executing a method of establishing a transaction map.
  • a combination of one or more computing devices 800 may be used to jointly execute the instructions of a method for establishing a transaction graph.
  • the memories 806 in different computing devices 800 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the method for establishing a transaction graph. That is, the instructions stored in the memory 806 in different computing devices 800 can implement one or more modules among the acquisition module 701, the segmentation module 702, the determination module 703, the event graph establishment module 704, the fusion module 705 and the recommendation module 706. function.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 10 shows a possible implementation.
  • two computing devices 800A and 800B are connected through a network.
  • the connection to the network is made through a communication interface in each computing device.
  • the memory 806 in the computing device 800A stores instructions for executing the functions of the acquisition module 701, the segmentation module 702, and the fusion module 705.
  • the memory 806 in the computing device 800B stores instructions for executing the functions of the determination module 703, the matter map creation module 704, and the recommendation module 706.
  • the computing device 800A is used to obtain the first data, segment or fuse the first data, and send the segmented or fused data to the computing device 800B through the network.
  • the computing device 800B determines the processed data.
  • the logical relationships between these data and the logical relationships between these data are used to establish a fact map, and relevant content is recommended to users based on the established fact map.
  • the functions of the computing device 800A shown in FIG. 10 can also be completed by multiple computing devices 800, or the cloud service platform includes multiple computing devices with the same functions as the computing device 800A.
  • the functions of the computing device 800B can also be completed by multiple computing devices 800, or the cloud service platform includes multiple computing devices that have the same functions as the computing device 800B.
  • the embodiment of the present application also provides another computing device cluster.
  • the connection relationship between the computing devices in the computing device cluster can be similar to the connection method of the computing device cluster described in FIG. 9 and FIG. 10 .
  • the memory 806 of one or more computing devices 800 in the computing device cluster may store different instructions for executing a method of establishing a transaction map.
  • the memory 806 of one or more computing devices 800 in the computing device cluster may also store part of instructions for executing a method of establishing a transaction map.
  • a combination of one or more computing devices 800 may jointly execute instructions for performing a method of establishing a transaction graph.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product is run on at least one computing device, at least one computing device is caused to execute a method for establishing a transaction map.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct a computing device or a cluster of computing devices to execute a method for establishing a transaction graph.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种事理图谱建立方法及相关装置,所述方法包括:获取第一数据,第一数据中包括视频、图像、文本、语音中的任意一种或者多种;将第一数据切分为m个集合,m个集合中的每个集合的数据代表一类事件,一类事件包括至少一个事件,m为任意正整数;确定m个集合中的数据代表的m类事件之间的逻辑关系;以m类事件中的每类事件分别为节点,m类事件之间的逻辑关系为节点的边,建立事理图谱。采用本申请的方法,可以以事件为节点,以事件之间的逻辑关系为节点的边,建立事理图谱,满足市场的需求。

Description

一种事理图谱建立方法及相关装置
本申请要求于2022年03月11日提交中国专利局、申请号为202210239563.0、申请名称为“一种数据处理方法和计算机”的中国专利申请的优先权,以及要求于2022年06月24日提交中国专利局、申请号为202210726908.5、申请名称为“一种事理图谱建立方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及事理图谱领域,具体涉及一种事理图谱建立方法及相关装置。
背景技术
知识图谱是以实体与实体之间的关系为核心建立的,它在搜索、推荐、问答等人工智能任务上获得了广泛的应用。
知识图谱具有一定的表征局限性,它只能表征实体与实体之间的关系,未能对事件与事件之间的关系进行表征。实际应用中,人们在了解到一个事件后,通常更希望对该事件的起因、发展状况、结果和经验教训等有更多的了解,甚至想要获取与该事件相关的或类似的事件,但现有的知识图谱不能满足市场的需求。
发明内容
本申请提供了一种事理图谱建立方法及相关装置,采用本申请所述的方法建立事理图谱,事理图谱中每个节点上的数据包括视频、文本、语音、图像中的一种或多种,每个节点上的数据代表一类事件,节点之间的边代表事件之间的逻辑关系,建立好的事理图谱可应用于多种场景下,应用范围广、适用性强。
第一方面,本申请提供了一种事理图谱建立方法,包括:获取第一数据,所述第一数据中包括视频、图像、文本、语音中的任意一种或者多种;将所述第一数据切分为m个集合,所述m个集合中的每个集合的数据代表一类事件,所述一类事件包括至少一个事件,m为任意正整数;确定所述m个集合中的数据代表的m类事件之间的逻辑关系;以所述m类事件中的每类事件分别为节点,所述m类事件之间的逻辑关系为所述节点的边,建立事理图谱。
可以看到,本申请提供了一种事理图谱建立方法,将第一数据切分为m个集合,每个集合代表一类事件,确定m个集合中的数据代表的m类事件之间的逻辑关系,以m类事件中的每类事件分别为节点,以m类事件之间的逻辑关系为节点的边,建立事理图谱。采用本申请所述的方法建立的事理图谱,事理图谱每个节点上的数据代表一类事件,节点之间的边代表事件之间的逻辑关系,表征了事件与事件之间的逻辑关系,满足了市场的需求。
基于第一方面,在可能的实现方式中,所述第一数据中的视频包括影视节目视频、新闻报道视频、广告视频、摄录视频中的任意一种或多种。
可以看到,本申请中,第一数据中的视频可以包括各种类型、各个方面的视频,本申请所述的方法适用性强,应用范围广。
基于第一方面,在可能的实现方式中,所述节点中的数据包括视频、图像、文本、语音中的任意一种或者多种。
可以看到,事理图谱中每个节点上的数据包括视频、图像、文本、语音中的任意一种或多种,换句话说,每个节点上的数据可以包括多种类型,而传统的知识图谱是基于文本建立 的,知识图谱中每个节点上的数据只包括文本。
基于第一方面,在可能的实现方式中,所述逻辑关系包括因果关系、时序关系的一种或多种。
基于第一方面,在可能的实现方式中,所述确定所述m个集合中的数据代表的m类事件之间的逻辑关系,包括:根据训练好的逻辑关系预测模型,确定所述m个集合中的数据代表的m类事件之间的逻辑关系。
基于第一方面,在可能的实现方式中,在所述第一数据为至少一个视频的情况下,且所述一类事件包括一个事件的情况下,所述将所述第一数据切分为m个集合,包括:
对于所述至少一个视频中的每个视频,根据(i)至(iii)中的一者或多者,对所述至少一个视频进行切分获得所述m个集合,其中,
(i)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
(ii)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
(iii)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中。
基于第一方面,在可能的实现方式中,在所述第一数据中为至少一个视频的情况下,且所述一类事件包括至少一个事件的情况下,所述将所述第一数据切分为m个集合,包括:
将所述第一数据切分为n个集合,所述n个集合中的每个集合的数据代表一个事件,其中,n大于等于m;
对所述n个集合的数据所代表的n个事件中的相似事件进行融合,获得所述m个集合。
可以理解,可以先将第一数据划分为n个集合,n个集合中的每个集合代表一个事件,再将n个集合代表的n个事件中的相似事件进行融合,获得m个集合。
基于第一方面,在可能的实现方式中,所述将所述第一数据切分为n个集合,包括:对于所述至少一个视频中的每个视频,根据(iv)至(vi)中的一者或多者,对所述至少一个视频进行切分获得所述n个集合,其中,
(iv)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
(v)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
(vi)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中;
所述对所述n个集合的数据所代表的n个事件中的相似事件进行融合,包括:根据条件(vii)至(x)中的任意一者或多者,对所述n个事件中的相似事件进行融合,其中,
(vii)根据各个事件的标题相似度或关键词相似度,
(viii)根据各个集合的视频中携带的字幕内容或语音内容的相似度,
(ix)根据各个集合的视频中是否包含相同的视频帧,
(x)根据各个集合的视频中是否存在相同的人物身份。
基于第一方面,在可能的实现方式中,所述方法还包括:根据用户的操作,基于建立好 的事理图谱,向用户推荐相关内容。
第二方面,本申请提供了一种建立事理图谱的装置,包括:
获取模块,用于获取第一数据,所述第一数据中包括视频、图像、文本、语音中的任意一种或者多种;
切分模块,用于将所述第一数据切分为m个集合,所述m个集合中的每个集合的数据代表一类事件,所述一类事件包括至少一个事件,m为任意正整数;
确定模块,用于确定所述m个集合中的数据代表的m类事件之间的逻辑关系;
事理图谱建立模块,用于以所述m类事件中的每类事件分别为节点,所述m类事件之间的逻辑关系为所述节点的边,建立事理图谱。
基于第二方面,在可能的实现方式中,所述第一数据中的视频包括影视节目视频、新闻报道视频、广告视频、摄录视频中的任意一种或多种。
基于第二方面,在可能的实现方式中,所述节点中的数据包括视频、图像、文本、语音中的任意一种或者多种。
基于第二方面,在可能的实现方式中,所述逻辑关系包括因果关系、时序关系的一种或多种。
基于第二方面,在可能的实现方式中,所述确定模块用于:根据训练好的逻辑关系预测模型,确定所述m个集合中的数据代表的m类事件之间的逻辑关系。
基于第二方面,在可能的实现方式中,在所述第一数据为至少一个视频的情况下,且所述一类事件包括一个事件的情况下,所述切分模块用于:
对于所述至少一个视频中的每个视频,根据(i)至(iii)中的一者或多者,对所述至少一个视频进行切分获得所述m个集合,其中,
(i)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
(ii)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
(iii)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中。
基于第二方面,在可能的实现方式中,在所述第一数据中为至少一个视频的情况下,且所述一类事件包括至少一个事件的情况下,所述切分模块用于,将所述第一数据切分为n个集合,所述n个集合中的每个集合的数据代表一个事件,其中,n大于等于m;所述装置还包括融合模块,所述融合模块用于,对所述n个集合的数据所代表的n个事件中的相似事件进行融合,获得所述m个集合。
基于第二方面,在可能的实现方式中,所述切分模块用于:对于所述至少一个视频中的每个视频,根据(iv)至(vi)中的一者或多者,对所述至少一个视频进行切分获得所述n个集合,其中,
(iv)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
(v)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
(vi)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中;
所述融合模块用于:根据条件(vii)至(x)中的任意一者或多者,对所述n个事件中的相似事件进行融合,其中,
(vii)根据各个事件的标题相似度或关键词相似度,
(viii)根据各个集合的视频中携带的字幕内容或语音内容的相似度,
(ix)根据各个集合的视频中是否包含相同的视频帧,
(x)根据各个集合的视频中是否存在相同的人物身份。
基于第二方面,在可能的实现方式中,所述装置还包括:推荐模块,用于根据用户的操作,基于建立好的事理图谱,向用户推荐相关内容。
第二方面中的各个功能模块具体用于实现上述第一方面或第一方面的任意一种可能的实现方式所述的方法。
第三方面,本申请提供了一种计算设备集群,包括至少一个计算设备,所述至少一个计算设备中的每个计算设备包括存储器和处理器,所述存储器用于存储指令,所述处理器用于运行所述至少一个计算设备的存储器中存储的指令,以使所述计算设备集群执行上述第一方面或第一方面的任意一种可能的实现方式所述的方法。
第四方面,本申请提供了一种计算机可读存储介质,包括计算机程序指令,当所述计算机程序指令在计算设备集群上运行时,使得所述计算机集群执行上述第一方面或第一方面的任意一种可能的实现方式所述的方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当所述指令被计算机集群执行时,使得所述计算机集群执行上述第一方面或第一方面的任意一种可能的实现方式所述的方法。
附图说明
图1为本申请提供的一种事理图谱建立方法的流程示意图;
图2为本申请提供的一种场景示意图;
图3为本申请提供的一种有向有环的事理图谱示意图;
图4为本申请提供的一种有向无环事理图谱示意图;
图5为本申请提供的一种事理图谱建立方法的流程示意图;
图6为本申请提供的一种逻辑关系预测模型的训练预测结构示意图;
图7为本申请提供的一种建立事理图谱的装置结构示意图;
图8为本申请提供的一种计算设备的结构示意图;
图9为本申请提供的一种计算设备集群的结构示意图;
图10为本申请提供的又一种计算设备的结构示意图。
具体实施方式
事理图谱指的是以事件为节点,以事件与事件之间的关系为边建立的图谱。
本申请提供了一种事理图谱建立方法,参见图1所示,图1为本申请提供的一种事理图谱建立方法的流程示意图,所述方法包括但不限于以下内容的描述。
S101、获取第一数据,第一数据中包括视频、图像、文本、语音中的任意一种或者多种。
第一数据中可以包括视频,视频包括影视节目视频、新闻报道视频、广告视频、摄录视频中的任意一种或多种。
其中,影视节目视频例如可以包括某个电视剧片段、某个电影、某个电影片段、某期综艺节目、某期综艺节目片段等中的一种或多种,其中,某个电视剧片段、某个电影、某个电影片段、某期综艺节目、某期综艺节目片段等可以以动画的形式呈现,当然还可以以人、物等形式呈现,本申请不做限定。
视频还可以包括新闻报道视频,这里,新闻报道视频指的是将新闻通过视频的形式播报出来,其中新闻报道视频可以通过电视播放出来,也可以通过网络在终端设备上播放出来,其中终端设备可以是手机、台式电脑、笔记本、平板、显示屏、电子手表或其他电子设备等,本申请对新闻报道视频的播放载体不做限定,新闻报道视频可以是对最近发生的事情进行报道,也可以是对人们比较关注的历史事件进行报道,本申请不做限定,新闻报道视频可以是关于任何方面的,例如可以是军事方面的,也可以是政治方面的,也可以是历史方面的,也可以是财经方面的,也可以是日常生活方面的等等,本申请不做限定。
视频中还可以包括广告视频,这里广告视频指的是广告以视频的形式存在,广告视频可以是关于任何方面的,可以是关于某实物的广告视频,例如可以是关于人们日常生活中某生活用品的广告视频,也可以是对某开发软件的应用进行宣传,也可以是对价值观念或生活理念的宣传,等等,本申请对广告视频涉及的具体内容不做限定,广告视频可以来源于电视上呈现的广告视频,也可以来源于通过网络在终端设备上呈现的广告视频,本申请对广告视频的来源不做限定。
摄录视频可以包括摄像机、照相机、红外传感器等设备通过摄像或录像获取到的视频。例如,摄录视频可以包括,摄像机或照相机或终端设备对环境信息进行摄像或录制获得的视频,其中环境信息中可以包括人或物或景色等,终端设备例如可以是手机、电脑,甚至可以是具有摄像或录制功能的其他电子设备等,本申请对摄像内容不做具体限定。摄录视频可以是通过前置摄像头摄像或录制获得,也可以是通过后置摄像头摄像或录制获得,本申请不做限定。摄录视频还可以包括车辆上的行车记录仪采集到的视频。摄录视频还可以包括监控视频,监控视频指的是监控设备获取到的视频,监控设备例如摄像机。
在一种场景中,摄录视频可以是用户自己通过摄像设备或录制设备摄像录制的视频,该摄录视频可以是关于任何内容的视频,例如,摄录视频可以是关于美食的视频,也可以是关于美妆的视频,也可以是关于旅游的视频,也可以是关于娱乐搞笑的视频,等等。
在一种场景中,摄录视频可以是关于游戏的视频,例如,用户在终端设备上打游戏时,开启了终端设备上的录制功能,录制了用户视角下该局游戏的所有过程;又例如,用户在一终端设备上打游戏时,利用另一终端设备录制了该用户打游戏的全部过程,该过程可以通过网络直播的形式实时传播出去,也可以非实时传播。
在一种场景中,摄录视频还可以包括电话视频,电话视频可以是多个人通过手机或电脑或其他电子设备进行通信时,以自身视角录制的视频,其中,通信可以是通过手机或电脑或其他电子设备上的特定软件进行的,其中,特定软件例如可以是社交软件,也可以是用于开会的会议软件,也可以是通过用户身份识别卡(subscriber identity module,SIM)进行的,也可以是通过其他方式进行的,本申请不做限定。本申请对电话视频中涉及的内容不做具体限定。
第一数据中的视频还可以包括一些其他形式的视频,例如网络平台上,对某个新闻相关的评论、看法的视频等;对某个事件前因后果以及该事件产生的影响进行解析的视频;对影视节目视频的解析、评论的视频;对金融市场、财经行情等方面进行解析的视频;等等。本申请对视频的来源不做具体限定。
第一数据中还可以包括图像,图像可以是与影视节目相关的图像,例如,可以是某个电视剧片段、某个电影、某个电影片段、某期综艺节目、某期综艺节目片段中的任意一个或多个视频帧。图像也可以是与新闻报道相关的图像,例如可以是上述新闻报道视频中的任意一个视频帧或任意多个视频帧。图像也可以是与广告相关的图像,例如,可以是上述广告视频中的任意一个视频帧或任意多个视频帧。图像也可以是图像采集设备对环境信息进行采集获得的图像,环境信息中包括人或物或景色等,本申请对图像中的内容不做具体限定,对图像的获取形式不做具体限定。
第一数据中还可以包括文本,本申请对文本中的内容及文本来源不做具体限定,例如,文本可以是关于任何方向的论文、期刊、杂志、文章、报纸等中的部分或全部文本,也可以是新闻文本,新闻文本指的是将新闻通过期刊、杂志、报纸等形式进行记录或传播,也可以影视节目中的台词、话剧台词、戏剧台词、广告词等。
第一数据中还可以包括语音,本申请对语音的表达形式不做具体限定,例如,语音可以用普通话的形式表达,也可以用外语的形式表达,也可以以方言的形式表达。本申请对语音中的内容不做具体限定,例如,语音的内容可以是新闻播报,新闻播报指的是将新闻通过电台或其他形式播报出来,也可以是多个人之间的对话,语音的内容也可以是歌词、影视节目中的台词、话剧台词、戏剧台词、广告词等。
可选的,第一数据中还可以包括日志数据,日志数据例如可以是智能手表中记录的用户的活动轨迹数据,还可以是穿戴设备中记录的用户的心率、步数等数据,本申请对日志数据不做具体限定。可选的,在获取到第一数据之后,可以对第一数据进行预处理。例如,第一数据中包括图像的情况下,可以对图像进行去噪、图像增强等,第一数据中包括监控视频的情况下,可以将监控视频中的无效帧删除掉,无效帧例如可以是监控区域内未出现人的视频帧,第一数据中包括语音的情况下,可以采用语音增强技术去除语音中包含的噪音,提高语音质量,等等,本申请对预处理操作不做具体限定。
S102、将第一数据切分为m个集合,m个集合中的每个集合的数据代表一个事件。
在第一数据为至少一个视频的情况下,对至少一个视频中的每个视频分别进行切分,一共获得m个集合,m个集合中的每个集合中的数据代表一个事件。关于如何对至少一个视频中的每个视频进行切分,下面介绍几种可能的实现方式。
在一种可能的实现方式中,根据每个视频中相邻视频帧之间的画面相似度确定相邻视频帧描述的是否为一个事件,在确定相邻视频帧描述的是一个事件的情况下,将相邻视频帧切分至一个集合中,在确定相邻视频帧描述的不是一个事件的情况,将相邻视频帧切分至不同的集合中。例如,至少一个视频中的某个视频包括视频帧1、视频帧2、视频帧3…视频帧k,其中邻接关系为视频帧1-视频帧2-视频帧3-…-视频帧k,首先,确定视频帧1与视频帧2描述的是否为一个事件,若视频帧1与视频帧2描述的是一个事件,则将视频帧1与视频帧2切分至一个集合中,若视频帧1与视频帧2描述的不是一个事件,则将视频帧1与视频帧2切分至不同集合中,例如视频帧1切分至集合1中,视频帧2切分至集合2中;然后,确定视频帧2与视频帧3描述的是否是一个事件,若是,则切分至一个集合中,若不是,则切分至不同集合中,例如视频帧2切分至集合2中,视频帧3切分至集合3中;…如此,将该视频中的视频帧1、视频帧2、视频帧3…视频帧k切分至一个或多个集合中。
根据相邻视频帧之间的画面相似度确定相邻视频帧描述的是否为一个事件的方法可以是,若相邻视频帧之间的画面相似度大于或等于第一阈值,则确定相邻视频帧描述的是一个事件,若相邻视频帧之间的画面相似度小于第一阈值,则确定相邻视频帧描述的不是一个事件。例 如,若视频帧1与视频帧2之间的画面相似度大于第一阈值,视频帧2与视频帧3之间的画面相似度大于第一阈值,则确定视频帧1、视频帧2和视频帧3描述的是一个事件;若视频帧1与视频帧2之间的画面相似度大于第一阈值,视频帧2与视频帧3之间的画面相似度小于第一阈值,则将视频帧1与视频帧2描述的是一个事件,视频帧3与视频帧1、视频帧2描述的不是同一个事件;若视频帧1与视频帧2之间的画面相似度小于第一阈值,视频帧2与视频帧3之间的画面相似度大于第一阈值,则确定视频帧2与视频帧3描述的是一个事件,视频帧1描述的是与视频帧2、视频帧3不同的事件;等等。其中,第一阈值可以根据具体情况具体设置,本申请不做限定。
在一种可能的实现方式中,视频中携带有字幕,则可根据视频中携带的字幕内容确定哪些视频帧描述的是一个事件,将描述为一个事件的视频帧切分至一个集合中,描述为不同事件的视频帧切分至不同集合中。可选的,字幕实质是文本,可以根据字幕内容,利用自然语言处理技术,确定视频中的哪些视频帧描述的是一个事件。
在一种可能的实现方式中,视频中携带有语音,则可根据视频中携带的语音内容确定哪些视频帧描述的是一个事件,将描述为一个事件的视频帧切分至一个集合中,描述为不同事件的视频帧切分至不同集合中。可选的,可以利用语音识别技术将语音转化为文字,再利用自然语言处理技术,根据视频中的语音内容确定哪些视频帧描述的是一个事件,或者,也可以直接根据利用语音处理技术,对语音进行处理,确定视频中的哪些视频帧描述的是一个事件,将描述为一个事件的视频帧切分至一个集合中。
在一种可能的实现方式中,可以根据每个视频中人物身份,对至少一个视频中的每个视频进行切分。可选的,可以将包含一个指定人物身份的视频帧切分至一个集合中;也可以将包含多个指定人物身份的视频帧切分至一个集合中,一个集合代表一个事件,指定人物身份可以是指定的某个人,或指定的某些人,或指定的某个物,或指定的某些物等。例如指定人物身份可以是影视节目中指定的角色名称,也可以是影视节目中指定角色的扮演者,还可以是新闻报道中指定的报道者或指定的主持人,等等,本申请对指定人物身份不做限定。
例如,在一种场景中,一则新闻报道视频中包括两个主持人,两个主持人分别为主持人A和主持人B,可以将包含主持人A的视频帧切分至一个集合中,将包含主持人B的视频帧切分至另一个集合中。
又例如,在一种场景中,视频中包括多集电视剧,多集电视剧中包括角色A、角色B、角色C、角色D、角色E以及其他角色,在一种示例中,可以将包含角色A的视频帧切分至一个集合中,将包含角色B、角色C、角色D、角色E以及其他角色的视频帧切分至另一个集合中;在又一种示例中,可以将同时包含角色A和角色B的视频帧切分至一个集合中,将包含角色C、角色D、角色E以及其他角色的视频帧切分至另一个集合中。
可选的,可以利用提取特征的方式,提取指定人物身份的特征,利用提取出的特征,将包含指定人物身份的视频帧切分至一个集合中,本申请对如何将指定人物身份的视频帧切分至一个集合中不做限定。
可选的,可以根据上述任意一种可能的实现方式对至少一个视频中的每个视频进行切分,也可以根据上述任意可能的实现方式的组合对至少一个视频中的每个视频进行切分。
可选的,上述介绍的任意一种可能的实现方式或多种可能的实现方式的组合,均可以通过模型来实现,对模型进行训练后,获得训练好的模型,将至少一个视频输入训练好的模型中,获得m个集合,其中m个集合中的一个集合代表一个事件。
在第一数据为很多图像的情况下,将很多图像划分至m个集合,其中m个集合中的一个 集合代表一个事件,m为任意正整数。例如,首先,将图像进行分类,例如,将图像分为风景图像、人图像、物体图像等。然后,将每一类图像划分至多个集合中,例如,对于人图像,可以根据人的身份,将人图像划分至多个集合中,比如,将一部电影或一部电视剧中同一个角色的图像划分至一个集合中,或者,将一个扮演者涉及到的所有影视节目中的图像划分至一个集合中;又例如,将与火灾或着火相关图像划分至一个集合中,将与追尾事故相关的图像划分至一个集合中,等等。最终将所有图像划分至m个集合中。又例如,对于一个图像,可以用文本描述该图像中的内容,例如,一个人在海上冲浪的风景图像,文本描述可以为“一个人在海上冲浪”,一条狗的图像可以用文本描述为“一条狗”,等等。根据文本描述,将很多图像划分至m个集合中,比如,文本描述中有相同的词语或关键词相同,可以划分至一个集合中。其中,用文本描述图像的内容,可以通过模型实现,模型可以基于大量图像和标签训练得到,标签为每个图像对应的文本描述。本申请对图像的具体划分方式不做限定,可以根据具体情况具体划分。
在第一数据既包括视频,又包括图像的情况下,根据视频对图像进行划分。比如,在一种实现方式中,可以先对视频进行切分,将视频切分至多个集合中,每个集合中的数据代表一个事件;对于每个图像来说,计算该图像与多个集合中任意一个集合中的任意一个视频帧或任意多个视频帧之间的相似度,若相似度大于或等于第一阈值,则将该图像划分至该集合中,若相似度小于第一阈值,则将该图像与另一集合中的任意一个视频帧或任意多个视频帧之间的相似度,若相似度大于或等于第一阈值,则将该图像划分至此集合中,否则再计算该图像与另一集合中的任意一个视频帧或任意多个视频帧之间的相似度…如此遍历,直至找到该图像所归属的集合,若该图像不属于多个集合中的任一集合,则将该图像划分至一个独立的集合中。对于被划分至独立的集合中的图像,可以将所有独立的集合的图像之间进行相似度的计算,若相似度大于或等于第一阈值,则将这些相似度大于或等于第一阈值的集合(图像)合并为一个集合,相似度小于第一阈值的,保持不变。
又比如,在一种实现方式中,先对视频进行切分,将视频切分至多个集合中;再确定图像中包括的人物身份,将图像划分至包括该人物身份的集合中。又比如,分别对视频和图像进行处理,即,将视频进行切分至多个集合中,再将图像划分至另外的多个集合中,分别计算包含视频的集合中任一视频帧或任意多个视频帧与包含图像的集合中的任一图像或任意多个图像之间的相似度,根据相似度对包含视频的集合与包含图像的集合进行合并,等等。
在第一数据为文本的情况下,可以利用自然语言处理技术,根据文本内容确定哪些文本描述的是一个事件,将描述为一个事件的文本切分至一个集合中,从而将所有文本切分至m个集合中。
在第一数据既包括视频,又包括文本的情况下,可以分别对视频和文本进行处理,将视频切分至多个集合中,将文本切分至另外的多个集合中;从包含视频的多个集合中任取一个集合,从包含文本的多个集合中任取一个集合,计算包含视频的集合中携带的字幕内容或语音内容,与包含文本的集合中的文本的相似度,若相似度大于或等于第二阈值,则将这两个集合合并,否则不合并;再从包含视频的多个集合中任取一个集合,从包含文本的多个集合中任取一个集合,进行相同的计算…直至遍历所有集合和所有可能的组合。
在第一数据为语音的情况下,可以利用自然语言处理技术,根据语音内容确定哪些语音描述的是一个事件,将描述为一个事件的语音切分至一个集合中,从而将所有语音切分至m个集合中。
在第一数据既包括视频,又包括语音的情况下,可以分别对视频和语音进行处理,将视 频切分至多个集合中,将语音切分至另外的多个集合中;从包含视频的多个集合中任取一个集合,从包含语音的多个集合中任取一个集合,计算包含视频的集合中携带的字幕内容或语音内容,与包含语音的集合中的语音内容的相似度,若相似度大于或等于第二阈值,则将这两个集合合并,否则不合并;再从包含视频的多个集合中任取一个集合,从包含语音的多个集合中任取一个集合,进行相同的计算…直至遍历所有集合和所有可能的组合。
对于第一数据中包括视频、图像、文本、语音中的至少两种的情况,将第一数据切分为m个集合的方法,上述仅仅用于举例,并不构成限定,本申请对具体划分方法不限定。
需要说明的是,切分后的m个集合中,每个集合中的数据包括视频、图像、文本、语音中的一种或多种。例如,参见图2所示的示例图,图2中,第一数据中包括多个视频、多个图像、多个文本、多段语音,对第一数据进行切分,即,对多个视频中的每一个视频进行切分,将每一个视频切分为一个或多个较小的视频,对多个图像进行切分,将各个图像划分至不同的集合中,对多个文本中的每个文本进行切分,将切分后的文本划分至不同的集合中,对多段语音中的每段语音进行切分,将切分后较小段的语音划分至不同的集合中,每个集合中的数据包括视频、图像、文本、语音中的一种或多种。图2仅仅用于解释说明,并不构成对本申请的任何限定。
S103、确定m个集合中的数据代表的m个事件之间的逻辑关系。
可选的,每个集合中的数据代表一个事件,先确定每个事件的标题,根据每个事件的标题确定m个事件之间的逻辑关系。
下面介绍一下如何确定每个事件的标题,以及如何根据各个时间的标题确定事件之间的逻辑关系。
在一种实现方式中,一个集合中包括视频,用文本描述视频中每个视频帧的内容,该步骤与用文本描述每个图像的内容类似,再根据每个视频帧的文本描述,利用算法进行计算,确定该视频的标题,即该集合代表的事件的标题。可选的,用文本描述每个视频帧的内容或每个图像的内容可以通过模型来实现,模型可基于大量视频帧或大量图像、标签训练得到,标签中包括每个视频帧或每个图像的文本描述。将视频输入训练好的模型中,可获得各个视频帧的文本描述。其中,模型可通过卷积神经网络、循环神经网络等方式实现,本申请对模型的具体实现方式和训练方式不做限定。对于集合中包括多个图像的情况,同样可以利用该方法确定多个图像代表的事件的标题。
在一种实现方式中,一个集合中包括视频,视频中携带有字幕或语音,可以根据字幕内容或语音内容,利用自然语言处理技术,确定出每个视频的标题,即该集合代表的事件的标题。对于集合中包括文本或语音的情况,可利用同样的方法确定文本代表的事件的标题或语音代表的事件的标题。
在确定出每个事件的标题后,将各个事件的标题输入逻辑关系预测模型中,获得各个事件之间的逻辑关系。其中逻辑关系预测模型可以基于大量事件的标题训练得到,可选的,也可以基于大量事件和标签得到,标签包括各个事件之间的逻辑关系,本申请对逻辑关系预测模型的训练方式不做限定。
可选的,确定每个事件的标题,根据每个事件的标题确定m个事件之间的逻辑关系,可以通过一个模型来实现。即,将每个集合中的数据均输入模型中,其中每个集合中的数据包括视频、图像、文本、语音等中的一种或多种,模型输出为各个集合之间的逻辑关系,即各个事件之间的逻辑关系。
其中,逻辑关系包括因果关系、时序关系中的一种或多种。其中,因果关系(causality 或causation)指的是原因和结果之间的关联关系。事件之间的因果关系指的是因为一个或多个事件导致另一个或多个事件,其中前一个或多个事件称为原因事件,后一个或多个事件称为结果事件,原因事件导致结果事件。通常来说,一个事件发生可能是一个原因造成的,有可能是多个原因造成的,即一个结果事件可能对应着一个原因事件,也可能对应着多个原因事件;一个原因可能导致一个结果,也可能导致多个结果,即一个原因事件可能对应着一个结果事件,有可能对应着多个结果事件。本申请对原因事件和结果事件之间的对应关系不做限定。例如,事件1为发生车祸,事件2为拨打110,则事件1与事件2之间是因果关系,其中事件1为因,事件2为果,因为“发生车祸”所以“拨打110”。
时序关系,指的是时间的先后顺序关系,事件之间的时序关系指的是多个事件只是在不同的时间发生的,事件之间没有明显的因果关系。例如,事件1为洗菜,事件2为切菜,事件3为煮米饭,事件1、事件2、事件3之间没有明显的因果关系(因为一个事件导致另一个事件),可以认为事件1、事件2、事件3之间是时序关系。
可选的,逻辑关系还可以包括其他关系,比如,让步关系、转折关系等等,事件之间的逻辑关系可以由用户根据具体情况、具体需求自己设置,比如,用户在训练逻辑关系预测模型时,可以在样本中设置事件与事件之间的逻辑关系有哪些,以及如何定义该逻辑关系等,从而根据训练好的逻辑关系预测模型预测事件之间的逻辑关系,本申请不做限定。
S104、以m个事件中的每个事件分别为节点,m个事件之间的逻辑关系为节点的边,建立事理图谱。
以每个事件为节点,各个事件之间的逻辑关系为节点的边,建立事理图谱,其中每个节点上的数据即每个集合中的数据,包括视频、图像、文本、语音等中的一种或多种,各个事件之间的边为有向边,边的方向表示的是事件之间的逻辑关系,由原因事件指向结果事件,或者,由时间上先发生的事件指向后发生的事件(只是发生的时间不同,在时间上有先发生和后发生的顺序,没有因果关系)。
可以理解,因为存在因果关系的事件之间的对应关系多样化,可能是一个原因事件对应着多个结果事件,也可能是多个原因事件对应着一个结果事件,也可能是一个原因事件对应着一个结果事件,因此,建立的事理图谱中可能存在环状,也可能不存在环状,也就是说,建立好的事理图谱可以为有向有环图,也可以为有向无环图。另外,第一数据中包括多个人物的数据或多个事物的数据,多个事件表示的是多个人物或多个事物的事件,也就是说事件是发生在多个人物或多个事物上的,只是发生的时间先后不同,按照时间发生的先后顺序(即时序关系)建立的事理图谱也可能构成环状。
参见图3,图3为本申请提供的一种有向有环的事理图谱示意图。图3中,“开车”与“发生车祸”之间没有必然的因果关系,只是事件发生的时间不同,属于时序关系;“发生车祸”与“拨打120”、“拨打110”是因果关系,因为“发生车祸”,所以“拨打120”、“拨打110”;“发生车祸”与“车辆爆炸”是因果关系,“发生车祸”导致了“车辆爆炸”;而“车辆爆炸”与“拨打120”、“拨打110”也属于因果关系,因为“车辆爆炸”,所以“拨打120”、“拨打110”;“拨打120”与“医生救治伤员”属于因果关系,因为“拨打120”,所以才会有“医生救治伤员”事件。
参见图4,图4为本申请提供的一种有向无环事理图谱示意图,图4中,事理图谱是根据事件发生的先后顺序建立的,该事理图谱中的所有事件均是时序关系,事件之间并无因果关系,例如,“坐公交车”事件与“吃火锅”事件只是在不同的时间发生而已,同理,“看电影”与“吃爆米花”、“喝可乐”之间也无因果关系,“喝可乐”与“在海边玩耍”也无因为关 系。
上述示例仅仅用于举例,并不构成对本申请的任何限定。
建立好的事理图谱,可以应用于任何终端设备上,例如终端设备可以是手机、台式电脑、笔记本、平板、穿戴设备等。可选的,将建立好的事理图谱安装在终端设备上,终端设备为用户提供查询功能,用户通过输入关键词,查询到目标内容,另外,基于建立好的事理图谱,终端设备还可以根据目标内容向用户推荐与目标内容相关的内容,例如,若用户查询的目标内容是原因事件,可终端设备基于事理图谱向用户推荐结果事件,若用户查询的目标内容是结果事件,可终端设备基于事理图谱向用户推荐原因事件,或者,终端设备向用户推荐与目标内容具有时序关系的事件,等等。
建立好的事理图谱可以应用于监控设备中,其中,事理图谱中设置了一个或多个报警事件,一个或多个报警事件可以位于事理图谱的一个节点上,也可以位于多个节点上,每个报警事件对应的数据包括视频、图像中的一种或多种。监控设备在对监控区域进行监控时,将获取到的监控区域的视频帧与每个报警事件中的视频帧或图像进行相似度的计算,若相似度大于或等于第一阈值,则确定监控区域发生了报警事件,监控设备触发报警操作,报警操作例如可以是监控设备发出鸣笛声,还可以是监控设备向相关人员发送提示信息,提示信息用于提示监控区域内发生了报警事件,等等。若相似度小于第一阈值,则不做处理。
参见图5,图5为本申请提供的又一种事理图谱建立方法的流程示意图,所述方法包括但不限定于以下内容的描述。
S201、获取第一数据,第一数据中包括视频、图像、文本、语音中的任意一种或者多种。
S202、将第一数据切分为n个集合,n个集合中的每个集合的数据代表一个事件。
步骤S201、S202可分别参考图1方法实施例中步骤S101、S102中相关内容的描述,为了说明书的简洁,在此不再赘述。
S203、对n个集合所代表的n个事件中的相似事件进行融合,获得m个集合,m个集合中的每个集合代表一类事件。
对n个集合进行融合,获得m个集合,m个集合中的每个集合中代表一类事件,一类事件可以包括一个事件,也可以包括多个事件,一类事件包括多个事件,指的是由n个事件中的两个或两个以上的事件融合得到的多个事件,其中n大于或等于m。
下面介绍一个如何进行融合。
在一种可能的实现方式中,n个集合中的每个集合中的数据代表一个事件,先确定每个事件的标题,然后根据各个事件的标题的相似度,对事件进行融合。其中,标题的相似度可以理解为,两个事件的标题中包含相同文本的数量是否大于或等于第三阈值,若大于或等于第三阈值,则进行融合,否则不融合。关于确定标题的方法,可参考图1方法实施例步骤S103中确定每个事件的标题相关内容的描述。
在一种可能的实现方式中,用多个关键词描述每个集合中的数据代表的事件,根据各个事件之间包含相同关键词的数量,对n个事件进行融合。例如,在两个事件包含相同关键词的数量大于或等于第三阈值的情况下,可以将两个事件进行合并,即将两个事件对应的集合合并,若包含相同关键词的数量小于第三阈值的情况下,则不合并,第三阈值可根据具体情况具体设置。
可选的,确定每个集合中的数据代表的事件的关键词的方法,与图1方法实施例步骤S103中确定每个事件的标题的方法类似,对于视频/图像来说,用文本描述视频中每个视频帧/图像 的内容,再根据每个视频帧/图像的文本描述,利用算法进行计算,确定该视频/图像涉及的关键词;对于包括字幕或语音的视频、文本、语音来说,利用自然语言处理技术,确定关键词。确定每个集合中的数据代表的事件的关键词还可以通过其他方法,本申请不做限定。
在一种可能的实现方式中,可以根据各个集合的视频中携带的字幕内容或语音内容的相似度,对n个事件进行融合。例如,计算集合1中视频携带的字幕内容与集合2中视频携带的字幕内容的相似度,若相似度大于或等于第二阈值,则将集合1与集合2进行合并,若相似度小于第二阈值,则不合并。
在一种可能的实现方式中,可以根据各个集合的视频中是否包含相同的视频帧,若两个集合中包含相同的视频帧,则可以将两个集合进行合并,若不包含相同的视频帧,则不合并。
在一种可能的实现方式中,若两个集合的视频中存在相同的人或相同的物,则可以将两个集合合并,例如,集合1视频中包含有角色A的视频帧,集合2视频中也包含有角色A的视频帧,则两个集合可以合并;又例如,集合1视频中包含解说者B的视频帧,集合2视频中也包含解说者B的视频帧,则两个集合可以合并。若两个集合的视频中不包含相同的人或相同的物,则不合并。
S204、确定m个集合中的数据代表的m类事件之间的逻辑关系。
可选的,可以先确定m类事件中每类事件的标题,根据每类事件的标题确定m类事件之间的逻辑关系。确定每类事件的标题与确定每个事件的标题的方法类似,具体可参见图1方法实施例步骤S103中相关内容的描述。
S205、以m类事件中的每类事件分别为节点,m类事件之间的逻辑关系为节点的边,建立事理图谱。
关于本实施例中,未详尽的步骤可参考图1方法实施例中相关内容的描述,为了说明书的简洁,在此不再赘述。
下面以逻辑关系预测模型为例,介绍一下模型的训练、预测过程。参见图6,图6为本申请实施例提供了一种逻辑关系预测模型的训练、预测结构示意图。如图6所示,数据获取设备560用于获取训练数据,训练数据可以包括用于代表事件的视频、图像、文本、语音等中的一种或多种,训练数据还可以包括标签,标签中包括各个事件之间的逻辑关系,其中逻辑关系包括因果关系、时序关系等。
在获取到训练数据之后,数据获取设备560将这些训练数据存入数据库530中,数据库530可以实现对训练数据进行维护。训练设备520可以基于数据库530中的训练数据进行训练,从而获得训练好的逻辑关系预测模型513,将训练好的逻辑关系预测模型513移植至执行设备510上。可选,训练设备520可以独立于执行设备510存在,也可以集成于执行设备510内部。
用户可以通过执行设备510的输入输出I/O接口512输入需要预测的数据,比如,m个集合的数据,其中每个集合中的数据代表一类事件,或者,也可以通过数据获取设备560将m个集合的数据输入至数据库530中,然后执行设备510从数据库530中获取m个集合的数据,逻辑关系预测模型513对输入m个集合进行逻辑关系预测,确定出各个集合之间的逻辑关系,并将各个集合之间的逻辑关系通过输入输出I/O接口512输出。
需要说明的是,在实际的应用中,所述数据库530中维护的训练数据不一定都来自于数据获取设备560,也有可能是从其他设备获取得到的。另外需要说明的是,训练设备520也不一定完全基于数据库530维护的训练数据进行逻辑关系预测模型513的训练,也有可能从 其他设备获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
逻辑关系预测模型513可以应用在本申请图1或图5所示的方法实施例中。在执行设备510对输入数据进行处理,或者在执行设备510的计算模块511执行计算等相关的处理过程中,执行设备510可以调用数据存储系统550中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统550中。
需要说明的是,训练设备520可以针对不同的目标,基于不同的训练数据生成相应的逻辑关系预测模型513,该相应的逻辑关系预测模型513可以用于实现上述目标,从而为用户提供所需的结果。
这里仅仅以逻辑关系预测模型为例说明,对于本申请中涉及的其他模型也可以通过图6所示的方法训练和预测,本申请不做限定。
参见图7,图7为本申请提供的一种建立事理图谱的装置700结构示意图,所述装置700包括:
获取模块701,用于获取第一数据,第一数据中包括视频、图像、文本、语音中的任意一种或者多种;
切分模块702,用于将第一数据切分为m个集合,m个集合中的每个集合的数据代表一类事件,一类事件包括至少一个事件,m为任意正整数;
确定模块703,用于确定m个集合中的数据代表的m类事件之间的逻辑关系;
事理图谱建立模块704,用于以m类事件中的每类事件分别为节点,m类事件之间的逻辑关系为节点的边,建立事理图谱。
在可能的实现方式中,第一数据中的视频包括影视节目视频、新闻报道视频、广告视频、摄录视频中的任意一种或多种。
在可能的实现方式中,节点中的数据包括视频、图像、文本、语音中的任意一种或者多种。
在可能的实现方式中,逻辑关系包括因果关系、时序关系的一种或多种。
在可能的实现方式中,确定模块703用于:根据训练好的逻辑关系预测模型,确定m个集合中的数据代表的m类事件之间的逻辑关系。
在可能的实现方式中,在第一数据为至少一个视频的情况下,且一类事件包括一个事件的情况下,切分模块702用于:
对于至少一个视频中的每个视频,根据(i)至(iii)中的一者或多者,对至少一个视频进行切分获得m个集合,其中,
(i)每个视频中相邻视频帧之间的相似度确定相邻视频帧描述的是否为一个事件,若是,将相邻视频帧切分至一个集合中,若否,将相邻视频帧切分至不同集合中,
(ii)每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将描述为一个事件的视频帧切分至一个集合中,
(iii)每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中。
在可能的实现方式中,在第一数据中为至少一个视频的情况下,且一类事件包括至少一个事件的情况下,切分模块702用于,将第一数据切分为n个集合,n个集合中的每个集合的数据代表一个事件,其中,n大于等于m;装置还包括融合模块705,融合模块705用于,对n个集合的数据所代表的n个事件中的相似事件进行融合,获得m个集合。
在可能的实现方式中,切分模块702用于:
对于至少一个视频中的每个视频,根据(iv)至(vi)中的一者或多者,对至少一个视频进行切分获得n个集合,其中,
(iv)每个视频中相邻视频帧之间的相似度确定相邻视频帧描述的是否为一个事件,若是,将相邻视频帧切分至一个集合中,若否,将相邻视频帧切分至不同集合中,
(v)每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将描述为一个事件的视频帧切分至一个集合中,
(vi)每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中;
融合模块705用于:根据条件(vii)至(x)中的任意一者或多者,对n个事件中的相似事件进行融合,其中,
(vii)根据各个事件的标题相似度或关键词相似度,
(viii)根据各个集合的视频中携带的字幕内容或语音内容的相似度,
(ix)根据各个集合的视频中是否包含相同的视频帧,
(x)根据各个集合的视频中是否存在相同的人物身份。
在可能的实现方式中,装置700还包括:推荐模块706,用于根据用户的操作,基于建立好的事理图谱,向用户推荐相关内容。
其中,获取模块701、切分模块702、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来以切分模块702为例,介绍切分模块702的实现方式。类似的,获取模块701、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706的实现方式可以参考切分模块702的实现方式。
模块作为软件功能单元的一种举例,切分模块702可以包括运行在计算设备上的代码。其中,计算设备可以是云服务中的计算设备,其中计算设备例如可以是裸金属服务器、虚拟机,进一步地,计算设备可以是一台或多台。例如,切分模块702可以包括运行在多个计算设备上的代码。需要说明的是,用于运行该代码的多个计算设备可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个计算设备可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个区域region可以包括多个可用区AZ。
同样,用于运行该代码的多个计算设备可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。
模块作为硬件功能单元的一种举例,切分模块702可以包括至少一个计算设备,如服务器、计算机、手机等。或者,A模块也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
切分模块702包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。切分模块702包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的 AZ中。同样,切分模块702包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、虚拟机、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
需要说明的是,在其他实施例中,切分模块702可以用于执行一种事理图谱建立方法中的任意步骤,获取模块701、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706均可以用于执行一种事理图谱建立方法中的任意步骤,获取模块701、切分模块702、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706负责实现的步骤可根据需要指定,通过获取模块701、切分模块702、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706分别实现一种事理图谱建立方法中不同的步骤,来实现建立事理图谱的装置700的全部功能。
参见图8,图8为本申请提供的一种计算设备800的结构示意图,计算设备800例如裸金属服务器、虚拟机,该计算设备800可以配置为建立事理图谱的设备,建立事理图谱的设备可以为手机、计算机、平板、服务器,计算设备800包括:总线802、处理器804、存储器806和通信接口808。处理器804、存储器806和通信接口808之间通过总线802通信。计算设备800可以是服务器或终端设备。应理解,本申请不限定计算设备800中的处理器、存储器的个数。
总线802可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线802可包括在计算设备800各个部件(例如,存储器806、处理器804、通信接口808)之间传送信息的通路。
处理器804可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器806可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器804还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器806中存储有可执行的程序代码,处理器804执行该可执行的程序代码以分别实现前述获取模块701、切分模块702、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706的功能,从而实现一种事理图谱建立方法。也即,存储器806上存有用于执行一种事理图谱建立方法的指令。
通信接口808使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备800与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是手机、计算机、笔记本、平板、服务器,服务器例如可以是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。
如图9所示,图9为本申请提供的一种计算设备集群的结构示意图,所述计算设备集群包括至少一个计算设备800。计算设备集群中的一个或多个计算设备800中的存储器806中 可以存有相同的用于执行一种事理图谱建立方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备800的存储器806中也可以分别存有用于执行一种事理图谱建立方法的部分指令。换言之,一个或多个计算设备800的组合可用于共同执行一种事理图谱建立方法的指令。
需要说明的是,计算设备集群中的不同的计算设备800中的存储器806可以存储不同的指令,分别用于执行事理图谱建立方法的部分功能。也即,不同的计算设备800中的存储器806存储的指令可以实现获取模块701、切分模块702、确定模块703、事理图谱建立模块704、融合模块705和推荐模块706中的一个或多个模块的功能。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图10示出了一种可能的实现方式。如图10所示,两个计算设备800A和800B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备800A中的存储器806中存有执行获取模块701、切分模块702、融合模块705的功能的指令。同时,计算设备800B中的存储器806中存有执行确定模块703、事理图谱建立模块704和推荐模块706的功能的指令。计算设备800A用于获取第一数据,并对第一数据进行切分处理或融合处理,并将切分或融合处理后的数据通过网络发送至计算设备800B,计算设备800B确定处理后的数据之间的逻辑关系,并基于这些数据及这些数据之间的逻辑关系建立事理图谱,基于建立好的事理图谱向用户推荐相关内容。
应理解,图10中示出的计算设备800A的功能也可以由多个计算设备800完成,或者云服务平台中包括多个与计算设备800A具有相同功能的计算设备。同样,计算设备800B的功能也可以由多个计算设备800完成,或者云服务平台中包括多个与计算设备800B具有相同功能的计算设备。
本申请实施例还提供了另一种计算设备集群。该计算设备集群中各计算设备之间的连接关系可以类似的参考图9和图10所述计算设备集群的连接方式。不同的是,该计算设备集群中的一个或多个计算设备800中的存储器806中可以存有不同的用于执行一种事理图谱建立方法的指令。在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备800的存储器806中也可以分别存有用于执行一种事理图谱建立方法的部分指令。换言之,一个或多个计算设备800的组合可以共同执行用于执行一种事理图谱建立方法的指令。
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行一种事理图谱建立方法。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备或计算设备集群执行一种事理图谱建立方法。
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。

Claims (20)

  1. 一种事理图谱建立方法,其特征在于,包括:
    获取第一数据,所述第一数据中包括视频、图像、文本、语音中的任意一种或者多种;
    将所述第一数据切分为m个集合,所述m个集合中的每个集合的数据代表一类事件,所述一类事件包括至少一个事件,m为任意正整数;
    确定所述m个集合中的数据代表的m类事件之间的逻辑关系;
    以所述m类事件中的每类事件分别为节点,所述m类事件之间的逻辑关系为所述节点的边,建立事理图谱。
  2. 根据权利要求1所述的方法,其特征在于,所述第一数据中的视频包括影视节目视频、新闻报道视频、广告视频、摄录视频中的任意一种或多种。
  3. 根据权利要求1或2所述的方法,其特征在于,所述节点中的数据包括视频、图像、文本、语音中的任意一种或者多种。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述逻辑关系包括因果关系、时序关系的一种或多种。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述确定所述m个集合中的数据代表的m类事件之间的逻辑关系,包括:
    根据训练好的逻辑关系预测模型,确定所述m个集合中的数据代表的m类事件之间的逻辑关系。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,在所述第一数据为至少一个视频的情况下,且所述一类事件包括一个事件的情况下,
    所述将所述第一数据切分为m个集合,包括:
    对于所述至少一个视频中的每个视频,根据(i)至(iii)中的一者或多者,对所述至少一个视频进行切分获得所述m个集合,其中,
    (i)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
    (ii)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
    (iii)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中。
  7. 根据权利要求1-5任一项所述的方法,其特征在于,在所述第一数据中为至少一个视频的情况下,且所述一类事件包括至少一个事件的情况下,
    所述将所述第一数据切分为m个集合,包括:
    将所述第一数据切分为n个集合,所述n个集合中的每个集合的数据代表一个事件,其中,n大于等于m;
    对所述n个集合的数据所代表的n个事件中的相似事件进行融合,获得所述m个集合。
  8. 根据权利要求7所述的方法,其特征在于,所述将所述第一数据切分为n个集合,包括:
    对于所述至少一个视频中的每个视频,根据(iv)至(vi)中的一者或多者,对所述至少一个视频进行切分获得所述n个集合,其中,
    (iv)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
    (v)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
    (vi)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中;
    所述对所述n个集合的数据所代表的n个事件中的相似事件进行融合,包括:
    根据条件(vii)至(x)中的任意一者或多者,对所述n个事件中的相似事件进行融合,其中,
    (vii)根据各个事件的标题相似度或关键词相似度,
    (viii)根据各个集合的视频中携带的字幕内容或语音内容的相似度,
    (ix)根据各个集合的视频中是否包含相同的视频帧,
    (x)根据各个集合的视频中是否存在相同的人物身份。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:
    根据用户的操作,基于建立好的事理图谱,向用户推荐相关内容。
  10. 一种建立事理图谱的装置,其特征在于,包括:
    获取模块,用于获取第一数据,所述第一数据中包括视频、图像、文本、语音中的任意一种或者多种;
    切分模块,用于将所述第一数据切分为m个集合,所述m个集合中的每个集合的数据代表一类事件,所述一类事件包括至少一个事件,m为任意正整数;
    确定模块,用于确定所述m个集合中的数据代表的m类事件之间的逻辑关系;
    事理图谱建立模块,用于以所述m类事件中的每类事件分别为节点,所述m类事件之间的逻辑关系为所述节点的边,建立事理图谱。
  11. 根据权利要求10所述的装置,其特征在于,所述第一数据中的视频包括影视节目视频、新闻报道视频、广告视频、摄录视频中的任意一种或多种。
  12. 根据权利要求10或11所述的装置,其特征在于,所述节点中的数据包括视频、图像、文本、语音中的任意一种或者多种。
  13. 根据权利要求10-12任一项所述的装置,其特征在于,所述逻辑关系包括因果关系、时序关系的一种或多种。
  14. 根据权利要求10-13任一项所述的装置,其特征在于,所述确定模块用于:
    根据训练好的逻辑关系预测模型,确定所述m个集合中的数据代表的m类事件之间的逻辑关系。
  15. 根据权利要求10-14任一项所述的装置,其特征在于,在所述第一数据为至少一个视频的情况下,且所述一类事件包括一个事件的情况下,
    所述切分模块用于:
    对于所述至少一个视频中的每个视频,根据(i)至(iii)中的一者或多者,对所述至少一个视频进行切分获得所述m个集合,其中,
    (i)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
    (ii)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
    (iii)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中。
  16. 根据权利要求10-14任一项所述的装置,其特征在于,
    在所述第一数据中为至少一个视频的情况下,且所述一类事件包括至少一个事件的情况下,所述切分模块用于,将所述第一数据切分为n个集合,所述n个集合中的每个集合的数据代表一个事件,其中,n大于等于m;
    所述装置还包括融合模块,所述融合模块用于,对所述n个集合的数据所代表的n个事件中的相似事件进行融合,获得所述m个集合。
  17. 根据权利要求16所述的装置,其特征在于,所述切分模块用于:
    对于所述至少一个视频中的每个视频,根据(iv)至(vi)中的一者或多者,对所述至少一个视频进行切分获得所述n个集合,其中,
    (iv)所述每个视频中相邻视频帧之间的相似度确定所述相邻视频帧描述的是否为一个事件,若是,将所述相邻视频帧切分至一个集合中,若否,将所述相邻视频帧切分至不同集合中,
    (v)所述每个视频中携带的字幕内容或语音内容确定描述为一个事件的视频帧,将所述描述为一个事件的视频帧切分至一个集合中,
    (vi)所述每个视频中的人物身份,将包含一个指定人物身份的视频帧切分至一个集合中,或者,将包含多个指定人物身份的视频帧切分至一个集合中;
    所述融合模块用于:
    根据条件(vii)至(x)中的任意一者或多者,对所述n个事件中的相似事件进行融合,其中,
    (vii)根据各个事件的标题相似度或关键词相似度,
    (viii)根据各个集合的视频中携带的字幕内容或语音内容的相似度,
    (ix)根据各个集合的视频中是否包含相同的视频帧,
    (x)根据各个集合的视频中是否存在相同的人物身份。
  18. 根据权利要求10-17任一项所述的装置,其特征在于,所述装置还包括:
    推荐模块,用于根据用户的操作,基于建立好的事理图谱,向用户推荐相关内容。
  19. 一种计算设备集群,其特征在于,包括至少一个计算设备,所述至少一个计算设备中的每个计算设备包括存储器和处理器,所述存储器用于存储指令,所述处理器用于运行所述至少一个计算设备的存储器中存储的指令,以使所述计算设备集群执行如权利要求1至9任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令在计算设备集群上运行时,使得所述计算机集群执行如权利要求1至9任一项所述的方法。
PCT/CN2023/075917 2022-03-11 2023-02-14 一种事理图谱建立方法及相关装置 WO2023169159A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210239563 2022-03-11
CN202210239563.0 2022-03-11
CN202210726908.5 2022-06-24
CN202210726908.5A CN116775892A (zh) 2022-03-11 2022-06-24 一种事理图谱建立方法及相关装置

Publications (1)

Publication Number Publication Date
WO2023169159A1 true WO2023169159A1 (zh) 2023-09-14

Family

ID=87937187

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/075917 WO2023169159A1 (zh) 2022-03-11 2023-02-14 一种事理图谱建立方法及相关装置

Country Status (1)

Country Link
WO (1) WO2023169159A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003799A (zh) * 2020-07-27 2022-02-01 阿里巴巴集团控股有限公司 事件推荐方法、装置和设备
CN114020936A (zh) * 2022-01-06 2022-02-08 北京融信数联科技有限公司 多模态事理图谱的构建方法、系统和可读存储介质
CN114090794A (zh) * 2021-11-29 2022-02-25 中国平安人寿保险股份有限公司 基于人工智能的事理图谱构建方法及相关设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003799A (zh) * 2020-07-27 2022-02-01 阿里巴巴集团控股有限公司 事件推荐方法、装置和设备
CN114090794A (zh) * 2021-11-29 2022-02-25 中国平安人寿保险股份有限公司 基于人工智能的事理图谱构建方法及相关设备
CN114020936A (zh) * 2022-01-06 2022-02-08 北京融信数联科技有限公司 多模态事理图谱的构建方法、系统和可读存储介质

Similar Documents

Publication Publication Date Title
US11727577B2 (en) Video background subtraction using depth
CN108833973B (zh) 视频特征的提取方法、装置和计算机设备
CN111209440B (zh) 一种视频播放方法、装置和存储介质
JP5795580B2 (ja) タイムベースメディアにおけるソーシャルインタレストの推定および表示
US20210144418A1 (en) Providing video recommendation
US20160034786A1 (en) Computerized machine learning of interesting video sections
WO2018177139A1 (zh) 一种视频摘要生成方法、装置、服务器及存储介质
CN111222450B (zh) 模型的训练及其直播处理的方法、装置、设备和存储介质
US11641445B2 (en) Personalized automatic video cropping
CN115171014B (zh) 视频处理方法、装置、电子设备及计算机可读存储介质
CN116665083A (zh) 一种视频分类方法、装置、电子设备及存储介质
CN114625918A (zh) 视频推荐方法、装置、设备、存储介质及程序产品
US11961300B2 (en) Dynamic media content categorization method
WO2023169159A1 (zh) 一种事理图谱建立方法及相关装置
CN114357301B (zh) 数据处理方法、设备及可读存储介质
CN116775892A (zh) 一种事理图谱建立方法及相关装置
CN112019923B (zh) 视频剪切处理方法
US20220132209A1 (en) Method and system for real time filtering of inappropriate content from plurality of video segments
US10650240B2 (en) Movie content rating
US20240214620A1 (en) Method for personalized broadcasting
CN118042186A (zh) 提供视频封面的方法、装置、电子设备及计算机可读介质
KR20230140756A (ko) 영상 콘텐츠로부터 공간 및 시간 정보를 식별하는 방법
CN114969397A (zh) 纪要生成方法、装置及电子设备
CN116980693A (zh) 图像处理方法、装置、电子设备及存储介质
CN116980651A (zh) 视频插入贴片内容的方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23765737

Country of ref document: EP

Kind code of ref document: A1