US12481631B2 - Information segmentation methods, apparatuses, and electronic devices - Google Patents
Information segmentation methods, apparatuses, and electronic devicesInfo
- Publication number
- US12481631B2 US12481631B2 US18/572,606 US202218572606A US12481631B2 US 12481631 B2 US12481631 B2 US 12481631B2 US 202218572606 A US202218572606 A US 202218572606A US 12481631 B2 US12481631 B2 US 12481631B2
- Authority
- US
- United States
- Prior art keywords
- demarcation point
- information
- probability value
- node
- point probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Definitions
- the present disclosure relates to the technical field of the Internet, and in particular to information segmentation methods, apparatuses and electronic devices.
- the goal of information segmentation algorithm is to cut raw data into data fragments of different granularity, which is the basis of various subsequent applications, such as editing, understanding, recognition, and so on.
- the embodiment of the present disclosure provides an information segmentation method, which includes: for a target information node in an information sequence and based on first node information of the target information node, determining a first demarcation point probability value, wherein the first demarcation point probability value indicates a probability that the target information node is a first type of demarcation point; determining, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value, wherein the second demarcation point probability value indicates a probability that the information node is a second type of demarcation point; and determining, based on the first demarcation point probability value and the second demarcation point probability value, at least two segmentation modes for the information sequence, wherein segmentation granularities of different segmentation modes are different.
- the present embodiment provides an electronic device comprising: one or more processors; and a storage apparatus for storing one or more programs which when executed by the one or more processors, causing the one or more processors to perform the information segmentation method as described in the first aspect.
- the present embodiment provides a computer-readable medium having a computer program stored thereon, and the program, when executed by processors, performs the steps of the information segmentation method as described in the first aspect.
- FIG. 1 is a flow chart of one embodiment of an information segmentation method according to the present disclosure
- FIG. 2 is a scenario diagram of an information segmentation method according to the present disclosure
- FIG. 3 is a scenario diagram of an information segmentation method according to the present disclosure
- FIG. 4 is a scenario diagram of an information segmentation method according to the present disclosure.
- FIG. 5 is a schematic diagram of an implementation of an information segmentation method according to the present disclosure.
- FIG. 6 is a scenario diagram of an information segmentation method according to the present disclosure.
- FIG. 8 is a scenario diagram of an information segmentation method according to the present disclosure.
- FIG. 11 is a schematic diagram of a basic structure of an electronic device provided in accordance with the embodiments of the present disclosure.
- step 101 for a target information node in an information sequence and based on first node information of the target information node, a first demarcation point probability value is determined.
- the performing entity (for example, the server) of the information segmentation method may determine, for a target information node in an information sequence and based on first node information of the target information node, a first demarcation point probability value.
- the information sequence herein may be a sequence composed of information.
- the types of the information may include, but are not limited to, at least one of image, text, audio.
- the types of the information sequence herein are not limited.
- the information sequence may include at least one of the following but is not limited to video sequence, text sequence, etc.
- the target information node herein can be any information unit or any position in the information sequence.
- the target in the target information node, added for ease of explanation, is not a limitation to information units.
- the information sequence can be a video frame sequence
- time points at the joints of video frames can be used as information nodes.
- one video frame can be used as an information node, and a preset number (for example, 3) of video frames can also be used as an information node.
- the video frame sequence is taken as an example to illustrate.
- the target information node can be any video frame or time point in the video frame sequence.
- the first node information of the target information node herein can be node feature information of the target information node.
- the first node information is used to determine the first demarcation point probability value.
- the first demarcation point probability value herein indicates a probability that the target information node is a first type of demarcation point.
- corresponding features can be extracted for a plurality of frames before and after the target video frame respectively.
- the features of a total number of 2K frames within the time range of [t ⁇ K, t+K) are extracted; that is, the K consecutive frames before the target video frame and the K ⁇ 1 consecutive frames after the target video frame can be extracted.
- the first node information can be determined based on the extracted features of the 2K frames.
- the features of 2K frames can be extracted using Channel-Separated Convolution Networks (CSNs).
- CSNs Channel-Separated Convolution Networks
- the second demarcation point probability value herein can be used to characterize a probability that the target information node is a second type of demarcation point.
- a probability of the first type of demarcation point appearing in the information sequence is higher than that of the second type of demarcation point.
- the first type of demarcation point and the second type of demarcation point are associated with each other herein.
- the first type of demarcation point and the second type of demarcation point have associated relationship.
- an information node is the second type of demarcation point, that information node is also the first type of demarcation point usually.
- a video frame is an event demarcation point or not will have a direct impact on whether that video frame is a shot demarcation point. If the video frame is the event demarcation point, that video frame may be the shot demarcation point; and if the video frame is not the event demarcation point, a probability that the video frame is the shot demarcation point is low.
- step 103 based on the first demarcation point probability value and the second demarcation point probability value, at least two segmentation modes for the information sequence are determined.
- Segmentation granularities of different segmentation modes are different herein. Different segmentation granularities can include different lengths of segmented fragments of the information sequence. In general, the number of the second type of demarcation points is not greater than the number of the first type of demarcation points.
- the length of the fragments of information sequence obtained by segmenting at the first type of demarcation point of the information sequence may be shorter than that obtained by segmenting at the second type of demarcation point of the information sequence.
- Step 103 herein may include that whether the target information node is the first type of demarcation point can be determined by using the first demarcation point probability value of the target information node; and whether the target information node is the second type of demarcation point can be determined by using the second demarcation point probability value of the target information node.
- the method shown in FIG. 1 is used to process each information node in the information sequence, and a first demarcation point probability value and a second demarcation point probability value corresponding to each information node can be obtained. Then, by taking the first demarcation point probability value and the second demarcation point of each information node into account, the first type of demarcation points and the second type of demarcation points in the information sequence can be determined.
- segmenting at the first type of demarcation point can be understood as a first segmentation mode.
- the segmentation at the second type of demarcation point can be understood as a second segmentation mode.
- the steps in this embodiment can be implemented using a trained neural network.
- the information segmentation method provided in the present embodiment determines, for a target information node in an information sequence and based on first node information of the target information node, a first demarcation point probability value; and then determines, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value.
- the first demarcation point probability value can indicate a probability that the target information node is a first type of demarcation point
- the second demarcation point probability value can indicate a probability that the target information node is a second type of demarcation point.
- the target information node is determined as the first type of demarcation point and/or the second type of demarcation point in the information sequence, and thereby it can be determined whether the information sequence is segmented from the target information node.
- the accuracy of the second demarcation point probability value can be improved, and in turn, the accuracy for determination of the demarcation points can be improved to achieve accurate data segmentation.
- the segmentation modes corresponding to the at least two types of demarcation point can be obtained at the same time to realize the different granularities of hierarchical segmentation of information sequence.
- events, shots and scenes can be divided from fine to coarse bases on the granularities of segmentation.
- the classification layer of the next level network receives the output of the previous level network and the features of the next level network itself.
- An event demarcation point is first predicted, and then a shot demarcation point is predicted based on the event demarcation point. Finally, a scene demarcation point is predicted based on the event demarcation point and the shot demarcation point.
- the event can indicate the minimum video unit of cutoff.
- the fragments such as a complete action behavior, a continuous scene, etc., are video fragments with local continuous information.
- the shot can indicate shooting content that is shot by a camera lens without interruption, which has visual consistency. That is, the shot transition demarcations in the video are detected.
- the scene can indicate a semantic unit, which includes a series of semantically related shots or event fragments.
- the first type of demarcation point can be the event demarcation point
- the second type of demarcation point can be the shot demarcation point
- the first type of demarcation point can be the shot demarcation point and the second type of demarcation point can be the scene demarcation point
- the first type of demarcation point can be the event demarcation point and the second type of demarcation point can be the scene demarcation point.
- the demarcation point between a video fragment 201 and a video fragment 202 can be the first type of demarcation point.
- the first type of demarcation point can be the event demarcation point, and the video fragment 201 and the video fragment 202 can indicate relatively independent events.
- a video fragment 210 can include the video fragment 201 and the video fragment 202 .
- the demarcation point between the video fragment 210 and the video fragment 220 can be the second type of demarcation point.
- the second type of demarcation point can be the shot demarcation point.
- the video fragment 210 and the video fragment 220 can indicate separate shooting processes (before and after a shot transition).
- a video fragment 21 can include the video fragment 210 and the video fragment 220 .
- the demarcation point between the video fragment 21 and a video fragment 22 can be the third type of demarcation point.
- the third type of demarcation point could be the scene demarcation point.
- the video fragment 21 and the video fragment 22 can indicate relatively independent scenes.
- the video fragment 201 may show person A entering a house.
- the video fragment 202 may show person A drinking water in a living room after entering the house.
- the video fragment 203 may show person A entering a kitchen and talking to person B.
- the video fragment 210 can be shot in the living room.
- the video fragment 220 can be shot in the kitchen, and the demarcation point between the video fragment shot in the living room and the video fragment shot in the kitchen can be the shot demarcation point.
- a video fragment 204 can show person A and person B waiting for an elevator.
- a video fragment 205 can show person A and person B chatting in an elevator.
- a video fragment 206 can show person A and person B fighting in an elevator.
- a video fragment 230 could have been shot in the corridor, and the video fragment 240 could have been shot in the elevator.
- the demarcation point between the video fragment 230 and the video fragment 240 can be the shot demarcation point.
- the video fragment 21 can show the scene inside the house.
- the video fragment 22 can show the scene outside the house.
- the method further includes: obtaining third node information of the target information node; the third demarcation point probability value is determined based on the third node information and at least one of the first demarcation point probability value and the second demarcation point probability value.
- the third demarcation point probability value herein can be used to indicate a probability that whether the target information node is the third type of demarcation point.
- step 103 above may also include determining a segmentation mode corresponding to the third type of demarcation point.
- the first type of demarcation point may be the event demarcation point.
- the second type of demarcation point may be the shot demarcation point.
- the third type of demarcation point may be the scene demarcation point.
- the event demarcation point and the shot demarcation point are used as reference information to determine the scene demarcation point.
- the internal relation between the time demarcation point, the shot demarcation point and the scene demarcation point can be fully used and realized by technical means to determine more accurate scene demarcation points.
- first demarcation point probability value and/or the second demarcation point probability value can refer to the judgment result of the demarcation point with smaller granularity than that of the third type of demarcation point as a prior event, to help judge whether the target information node is the third type of demarcation point, which can improve the accuracy of determining whether the target information node is the third type of demarcation point.
- the method may further have a plurality of segmentation modes, each corresponding to one type of demarcation point, and the determination of each demarcation point probability value is related to at least one other demarcation point probability value.
- the method may involve 4 types of demarcation points.
- Each type of demarcation point may correspond to a segmentation mode.
- the determination of the demarcation point probability value may be related to the probability values of the other types of demarcation points.
- Hierarchical segmentation can be achieved regardless of the number of the types of demarcation point.
- the information sequence may include a video frame sequence.
- the first type of demarcation point may be an event demarcation point.
- the second type of demarcation point may be a shot demarcation point.
- the third type of demarcation point may be a scene demarcation point.
- the step 101 described above may comprise importing, into a first cascaded classifier, information of event advanced features for the target information node, wherein the first cascaded classifier comprises at least two first classifiers; generating the first demarcation point probability value based on confidences output by respective first classifiers in the first cascaded classifier.
- the advanced features herein comprise timing features and/or attention features.
- a cascaded classifier herein can include a plurality of classifiers that are cascaded at the time of training. In general, the judgment severities of each level of the cascaded classifier are different. The plurality of classifiers can be independent of each other when processing the first node information. Each classifier in the cascaded classifier can output a confidence.
- Each classifier in the first cascaded classifier herein can be referred to as the first classifier.
- the first classifier can be used to judge whether it is the first type of demarcation point.
- the first node information can be input to a respective first classifier of the first cascaded classifier 301 .
- Each of the first classifiers can output a confidence, in order to facilitate the distinction, the first classifiers in the first cascaded classifier can be distributed as a first classifier No. 1 and a first classifier No. 2.
- the first cascaded classifier is represented by two first classifiers, and the same is for other cascaded classifiers. Based on confidences output by the classifiers in the first cascaded classifier, the first demarcation point probability value can be obtained.
- the first demarcation point probability value may be generated in a variety of ways, including but not limited to at least one of averaging, weighted averaging, and taking the median.
- the second cascaded classifier can include a plurality of second classifiers, as shown in FIG. 3 .
- the second classifiers in the second cascaded classifier 302 (which are a second classifier No. 1 and a second classifier No. 2) can share information of shot advanced features, and each second classifier can output a confidence. According to the output confidences, a shot demarcation point probability value can be generated.
- the first classifiers in the first cascaded classifier correspond to event advanced feature extraction networks one by one.
- the determining a first demarcation point probability value comprises: obtaining basic feature information of the target information node; importing the basic feature information into each of the event advanced feature extraction networks, to obtain the information of event advanced features; inputting respective ones of the information of event advanced features into the corresponding first classifiers, to obtain confidences output by the first classifiers; determining, based on confidences output by the respective first classifiers, the first demarcation point probability value.
- the basic feature information herein comprises visual feature information and/or audio feature information.
- Advanced features herein comprise timing features and/or attention features.
- the first cascaded classifier 401 can include a plurality of first classifiers. Each classifier can have its own advanced feature extraction network. For example, an event advanced feature extraction network No. 1 can output information No. 1 of event advanced features to the first classifier No. 1 and an event advanced feature extraction network No. 2 can output information No. 2 of event advanced features to the first classifier No. 2.
- the second cascaded classifier 402 can include a plurality of second classifiers.
- Each classifier can have its own advanced feature extraction network.
- a shot advanced feature extraction network No. 1 can output information No. 1 of shot advanced features to the second classifier No. 1
- a shot advanced feature extraction network No. 2 can output information No. 2 of shot advanced features to the second classifier No. 2.
- the extraction networks can be shared between the first classifiers in the first cascaded classifier (see FIG. 3 ), or the respective classifier's own extraction network can be set up (see FIG. 4 ).
- the extraction networks corresponding to the classifiers to extract the information of advanced features
- the information of advanced features input into the respective classifier can be adapted to that classifier.
- different input information for the respective classifiers can increase the difference in the information output by the classifiers and improve the accuracy of the information output by the respective classifiers.
- the second cascaded classifier includes at least two second classifiers, and the second classifiers in the second cascaded classifier corresponds to shot advanced feature extraction networks one by one.
- the determining, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value comprises: importing the basic feature information into each of the shot advanced feature extraction networks, to obtain respective ones of information of shot advanced features; inputting the respective ones of information of shot advanced features and the first demarcation point probability value into the corresponding second classifier, to obtain confidences output by the second classifiers; and determining, based on confidences output by the respective second classifiers, the second demarcation point probability value.
- the second cascaded classifier 402 can include a second classifier No. 1 and a second classifier No. 2.
- the second classifier No. 1 can correspond to the shot advanced feature extraction network No. 1, which can be used to extract the features of the target information node to obtain information No. 1 of shot advanced features.
- the first demarcation point probability value and the information No. 1 of shot advanced features output by the shot advanced feature extraction network No. 1 can be used as input to the second classifier No. 1, and the second classifier No. 1 can output a confidence.
- the second classifier No. 2 can correspond to the shot advanced feature extraction network No. 2, which can be used to extract the features of the target information node to obtain information No. 2 of shot advanced features.
- the first demarcation point probability value and the information No. 2 of shot advanced features output by the shot advanced feature extraction network No. 2 can be used as input to the second classifier No. 2, and the second classifier No. 2 can output a confidence.
- the respective second classifiers can apply personalized processing to the first demarcation point probability, so that the first demarcation point probability plays different roles in different second classifiers, to adapt to the differential processing of the plurality of second classifiers in the classification of the same information node, and to cooperate with the second cascaded classifier to improve the accuracy of the second demarcation point probability value.
- a third cascaded classifier includes at least two third cascades, and the third classifiers in the third cascaded classifier corresponds to scene advanced feature extraction networks one by one.
- the method herein further includes importing the basic feature information into each of the scene advanced feature extraction networks, to obtain respective ones of information of scene advanced features; inputting, into the corresponding third classifier, the respective ones of the information of scene advanced features and at least one of the first demarcation point probability value and the second demarcation point probability value, to obtain confidences output by the third classifiers; and determining, based on the confidences output by the respective third classifiers, the third demarcation point probability value, wherein the third demarcation point probability value indicates a probability that the target information node is the event demarcation point.
- sample feature information of training samples is obtained, wherein the training samples are information nodes in a sample information sequence; importing the sample feature information into a target classifier in the cascaded classifier, wherein respective classifiers in the cascaded classifier correspond to error labels one by one, and an error indicated by an error label of a classifier at a higher level than that of the target classifier is smaller than an error indicated by an error label of the target classifier; in accordance with a determination that an output result of the target classifier is true, using a classifier with a higher level than the target classifier to process the sample node information; and in accordance with a determination that the output result of the target classifier is false, masking the training sample.
- the identification of demarcation points in the information sequence may be either black or white.
- the demarcation points between events in a video may be difficult to represent by a single image frame.
- error labels can be set in a way that specifically hedges the ambiguity of demarcation points. That is, a distance error is set.
- a positive sample label is labeled in the training samples for a certain frame (for example, the video frame in the middle of suspected demarcation point video frames) and then frames within the distance error threshold range of the positive sample are also labeled as the positive sample.
- the positive samples in the training sample will include more ambiguous time boundaries with which the classifier is trained. This will cause the classifier to judge the demarcation point more coarsely, resulting in more false positives predicted (that is, the information nodes that are not a demarcation point is judged as a demarcation point). Conversely, if the distance error is set relatively small, there will be fewer positive samples in the training samples. In this case, the network including the classifier will be exposed to a smaller proportion of positive samples and cannot repeatedly learn the features of the positive samples that are demarcation points in the sequence.
- the cascaded classification structure can be realized by setting up a set of classifiers.
- a set of classifiers can be denoted as ⁇ H 1 , H 2 , . . . H N ⁇ and the output of each classifier can be labeled as ⁇ S 1 , S 2 , . . . S N ⁇ , where N can be the number of cascaded classifiers.
- the 100th image frame can be set as a positive sample.
- a series of error labels can be set, which is denoted as ⁇ u 1 , u 2 , . . . u N ⁇ , u 1 ⁇ u 2 ⁇ . . . ⁇ u N .
- a series of sample labels for a training sample (e.g. the 105th frame in the sequence samples), which is denoted as ⁇ L 1 , L 2 , . . . L N ⁇ . That is, u 1 is equal to 10, and the training sample can be a positive sample; u2 is equal to 6, and the training sample can be a positive sample; but if u 2 is equal to 4, then the training sample is a negative sample.
- H 1 processes the training sample, and if the resulting confidence of S 1 is not less than a confidence threshold corresponding to H 1 , the training sample is imported into H 2 for processing. If the resulting confidence of S 1 is less than the confidence threshold corresponding to H 1 , the training sample will be abandoned.
- the confidence threshold corresponding to the classifier is positively correlated with the level of the classifier. In other words, a higher level of classifier corresponds to a higher confidence threshold.
- L 1 is a positive sample label
- S 1 is 90.
- the confidence threshold corresponding to H 1 can be 60, and then H 2 can be used to process the training sample.
- the resulting S 2 is 75, and if the confidence threshold corresponding to H 2 is 70, H 3 can be used to process the training sample; and if the confidence threshold corresponding to H 2 is 80, the training sample can be masked.
- L 2 is a negative sample label but S 2 is 75 and the confidence threshold corresponding to H 2 is 70 (that is, it is identified as a positive sample)
- this identification error situation can be optimized in adjusting the network parameters, so that the identification result of H 2 for the training sample is consistent with the label L 2 corresponding to H 2 .
- the method may further include obtaining, from a video frame sequence, a preset number of consecutive video frames before the target information node and a preset number of consecutive video frames after the target information node, to obtain a video frame sub-sequence; and importing, into a pre-trained basic feature extraction network, the video frame sub-sequence to obtain the basic feature information of the target information node.
- the features of a total number of 2K frames within the time range of [t ⁇ K, t+K) can be extracted for time t in the video; that is, the K consecutive frames before the target video frame and the K consecutive frames after the target video frame can be extracted.
- the basic feature information can be determined. These 2K frames in the video frame sequence can be used as the video sub-sequence.
- the basic feature extraction network includes at least one of a visual feature extraction network and an audio feature extraction network.
- the importing, into a basic feature extraction network, the video frame sub-sequence to obtain the basic feature information of the target information node is shown as the steps in FIG. 5 .
- step 501 an audio sequence corresponding to a video frame sub-sequence is obtained.
- step 502 the video frame sub-sequence is imported into the video feature extraction network to obtain visual feature information.
- Step 503 the audio sequence is imported into the audio feature extraction network to obtain audio feature information.
- Step 504 the visual feature information and the audio feature information are concentrated to obtain the basic feature information.
- the visual feature extraction network herein can be used to extract feature information of the video sub-sequence.
- the feature information input into the cascaded classifier can be extracted by feature extraction networks of various structures.
- the advanced feature extraction networks may include a timing network and/or an attention network.
- the visual feature extraction network can include Channel-Separated Convolution Networks (CSN).
- CSN Channel-Separated Convolution Networks
- the audio feature extraction network can use the short-time Fourier transform to extract frequency domain information of the audio sequence, and then use N one-dimensional convolution layers to extract timing features of the audio.
- the timing network can be used to extract the timing features.
- the specific structure of the timing network can be built based on an actual application scenario without limitation again.
- the extraction networks corresponding to the classifiers include the visual feature extraction network and the audio feature extraction network.
- the output of the visual feature extraction network and the output of the audio feature extraction network herein are concentrated to obtain the basic feature information.
- the concentrating the feature information of video frame sequence and audio sequence can utilize the characteristic of occurrence time correspondence between audio in the audio information and video frames in the video frame sequence, refer to other types of information at the corresponding time point to assist in determining whether the target information node is a demarcation point. Therefore, the target information node can be judged as a demarcation point on the basis of multi-modality data to improve the accuracy of judgment.
- the advanced feature extraction network includes the visual feature extraction network and the advanced feature extraction network.
- the event advanced feature extraction network may include a timing network and/or an attention network.
- the corresponding advanced feature information is obtained by in case that the advanced feature extraction networks comprise a timing network and an attention network, importing the basic feature information into the timing network and importing the basic feature information into the attention network; and concatenating an output of the timing extraction network and an output of the attention extraction network as the corresponding information of event advanced features.
- the information of shot advanced features can be obtained by importing the basic feature information into the shot advanced feature extraction network.
- the information of scene advanced features can be obtained by importing the information of scene features into the scene advanced feature extraction network.
- utilizing the advanced feature extraction networks to extract timing and/or attention features can be used as a buffer layer for feature differentiation before a classifier.
- sharing the visual feature extraction network and the audio feature extraction network can reduce the amount of calculation while the judgment accuracy of various types of demarcation points is ensured.
- the timing network includes dilated convolutions.
- time context information is important for detecting video event demarcations.
- the detections of different event demarcations require different amount of context information, and some event demarcations require a long amount of context information to be successfully detect.
- multi-layer one-dimension convolutional network with dilation can network reception field of time range.
- the dilation rate of each layer of the network is twice that of the previous layer.
- a bidirectional long short-term memory network can be used to capture the timing features of the entire sequence. After that, the maximum activation response at each time point can be extracted with a maximum pooling.
- the attention network can be constructed based on multi-head attention mechanisms.
- each video frame in a video sequence can be processed using N-layer self-attention layers, and the final output uses the same maximum pooling layer as the timing module to obtain maximum activation values for each time position.
- the attention network constructed with multi-head attention mechanism can fully mine the discriminative features and attend to the difference between demarcation point image frames and non-demarcation point image frames to ensure the segmentation accuracy.
- the output of the visual feature extraction network herein is taken as input to the advanced feature extraction network, and the output of the advanced feature extraction network is taken as the information of advanced features.
- the visual feature extraction network After the basic features of the sub-sequence is extracted with the visual feature extraction network, by using the advanced feature extraction network to extract at least one of the timing features and the attention features, the extracted deeper features can be used to judge whether the target information node is a demarcation point. Therefore, the establishment of the visual feature extraction network can help extract the more important features for the judgment of demarcation points and improve the accuracy of the judgment of demarcation points.
- an output of a visual feature extraction network and an output of an audio feature extraction network are concentrated to obtain a first concentrated result.
- the first concentrated result herein is used as input to an advanced feature extraction network, and an output of the advanced feature extraction network is used as node information.
- an output of a visual feature extraction network is used as input to an advanced feature extraction network, and an output of the advanced feature extraction network and an output of an audio feature extraction network are concentrated to obtain a second concentrated result, which is used as node information.
- secondary information can be audio features, which can be concatenated with information output by the advanced feature extraction network. Therefore, by avoiding the advanced feature extraction network to process the audio features of the secondary sequence, the information processing workload of the advanced feature extraction network can be reduced.
- the advanced feature extraction network includes a timing network and an attention network
- an output of the timing network and an output of the attention network are concentrated as the output of the advanced feature extraction network.
- Timing features and attention features By concatenating timing features and attention features as the output of the advanced feature extraction network, two types of deeper features can be combined herein to determine whether a target information node is a demarcation point.
- FIG. 8 shows that a visual feature extraction network and an audio feature extraction network are shared not only by respective first classifiers corresponding to a first cascaded classifier, but also among the first cascaded classifier, a second cascaded classifier and a third cascaded classifier.
- a classification module 801 can include an event advanced feature extraction network and the first classifiers in the first cascaded classifier, with ⁇ M in the lower right of the classification module 801 representing to have M classification modules 801 .
- a classification module 802 may include an event advanced feature extraction network and second classifiers in a second cascaded classifier, with ⁇ M in the lower right of the classification module 802 representing to have M classification modules 802 .
- a classification module 803 may include an event advanced feature extraction network and third classifiers in a third cascaded classifier, with ⁇ M in the lower right of the classification module 803 representing to have M classification modules 803 .
- the present disclosure provides an embodiment of an information processing apparatus corresponding to method embodiments shown in FIG. 1 .
- the apparatus can be specifically applied to various electronic devices.
- the information processing apparatus of the present embodiment comprises a first determination unit 901 , a second determination unit 902 , and a third determination unit 903 .
- the first determination unit is used to determine, for a target information node in an information sequence and based on first node information of the target information node, a first demarcation point probability value, wherein the first demarcation point probability value indicates a probability that the target information node is a first type of demarcation point
- the second determination unit is used to determine, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value, wherein the second demarcation point probability value indicates a probability that the information node is a second type of demarcation point
- the third determination unit is used to determine, based on the first demarcation point probability value and the second demarcation point probability value, at least two segmentation modes for the information sequence, wherein segmentation granularities of different segmentation modes are different.
- the specific processing of the first determination unit 901 , the second determination unit 902 and the third determination unit 903 of the information processing apparatus and the technical effects brought about by them can refer to the relevant descriptions of steps 101 , steps 102 and steps 103 in the corresponding embodiments in FIG. 1 respectively, which will not be repeated herein.
- the apparatus is further used to: obtaining third node information of the target information node; determining a third demarcation point probability value based on the third node information and at least one of the first demarcation point probability value and the second demarcation point probability value, wherein the third demarcation point probability value is used to indicate a probability that the target information node is a third type of demarcation point; and the determining, based on the first demarcation point probability value and the second demarcation point probability value, at least two segmentation modes for the information sequence further comprises: determining a segmentation mode corresponding to the third type of demarcation point.
- the apparatus further comprises a plurality of different segmentation modes and a plurality of corresponding demarcation point probability values, and a determination of each demarcation point probability value is related to at least one other demarcation point probability value.
- the information sequence comprises a video frame sequence, with the first type of demarcation point being an event demarcation point, the second type of demarcation point being a shot demarcation point, and the third type of demarcation point being a scene demarcation point.
- the determining a first demarcation point probability value comprises: importing, into a first cascaded classifier, information of event advanced features for the target information node, wherein the first cascaded classifier comprises at least two first classifiers, wherein advanced features comprise timing features and/or attention features; and generating the first demarcation point probability value based on confidences output by respective first classifiers in the first cascaded classifier.
- the first classifiers in the first cascaded classifier correspond to event advanced feature extraction networks one by one; and for a target information node in an information sequence and based on first node information of the target information node, the determining a first demarcation point probability value comprises: obtaining basic feature information of the target information node, wherein the basic feature information comprises visual feature information and/or audio feature information; importing the basic feature information into each of the event advanced feature extraction networks, to obtain the information of event advanced features, wherein advanced features comprise timing features and/or attention features; inputting respective ones of the information of event advanced features into the corresponding first classifiers, to obtain confidences output by the first classifiers; and determining, based on confidences output by the respective first classifiers, the first demarcation point probability value.
- a second cascaded classifier comprises at least two second classifiers, with the second classifiers in the second cascaded classifier corresponding to shot advanced feature extraction networks one by one; and the determining, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value comprises: importing the basic feature information into each of the shot advanced feature extraction networks, to obtain respective ones of information of shot advanced features; inputting the respective ones of information of shot advanced features and the first demarcation point probability value into the corresponding second classifier, to obtain confidences output by the second classifiers; and determining, based on confidences output by the respective second classifiers, the second demarcation point probability value.
- a third cascaded classifier comprises at least two third classifiers, with the third classifiers in the third cascaded classifier corresponding to scene advanced feature extraction networks one by one; and the apparatus is further used to: import the basic feature information into each of the scene advanced feature extraction networks, to obtain respective ones of information of scene advanced features; input, into the corresponding third classifier, the respective ones of the information of scene advanced features and at least one of the first demarcation point probability value and the second demarcation point probability value, to obtain confidences output by the third classifiers; and determine, based on the confidences output by the respective third classifiers, the third demarcation point probability value, wherein the third demarcation point probability value indicates a probability that the target information node is the event demarcation point.
- training steps of a cascaded classifier comprises: obtaining sample feature information of training samples, wherein the training samples are information nodes in a sample information sequence; importing the sample feature information into a target classifier in the cascaded classifier, wherein respective classifiers in the cascaded classifier correspond to error labels one by one, and an error indicated by an error label of a classifier at a higher level than that of the target classifier is smaller than an error indicated by an error label of the target classifier; in accordance with a determination that an output result of the target classifier is true, using a classifier with a higher level than the target classifier to process the sample node information; and in accordance with a determination that the output result of the target classifier is false, masking the training sample.
- the apparatus is further used to obtain, from a video frame sequence, a preset number of consecutive video frames before the target information node and a preset number of consecutive video frames after the target information node, to obtain a video frame sub-sequence; and import, into a pre-trained basic feature extraction network, the video frame sub-sequence to obtain the basic feature information of the target information node.
- the basic feature extraction network comprises at least one of a visual feature extraction network and an audio feature extraction network; and the importing, into a pre-trained basic feature extraction network, the video frame sub-sequence to obtain the basic feature information of the target information node comprises: obtaining an audio sequence corresponding to the video frame sub-sequence; importing the video frame sub-sequence into the video feature extraction network, to obtain the visual feature information; importing the audio sequence into the audio feature extraction network, to obtain the audio feature information; and concatenating the visual feature information and the audio feature information to obtain the basic feature information.
- the event advanced feature extraction networks comprise a timing network and/or an attention network; and information of advanced features is obtained by; in case that the advanced feature extraction networks comprise a timing network and an attention network, importing the basic feature information into the timing network and importing the basic feature information into the attention network; and concatenating an output of the timing extraction network and an output of the attention extraction network as the corresponding information of advanced features.
- the timing network comprise dilated convolutions.
- the attention network is constructed based on a multi-head attention mechanism.
- the information sequence is a text information sequence.
- FIG. 10 shows an exemplary system architecture to which an information processing method of one embodiment of the present disclosure can be applied.
- the system architecture may include terminal devices 1001 , 1002 , 1003 , network 1004 , and server 1005 .
- the network 1004 is used to provide a medium of a communication link between the terminal devices 1001 , 1002 , 1003 , and the server 1005 .
- the network 1004 can include various types of connection, such as wired, wireless communication links, or fiber optic cables.
- the terminal devices 1001 , 1002 , 1003 may interact with the server 1005 through the network 1004 to receive or send messages, etc.
- Various client applications can be installed on the terminal devices 1001 , 1002 , 1003 , such as web browser applications, search applications, news and information applications.
- the client applications in the terminal devices 1001 , 1002 , 1003 can receive user commands and perform corresponding functions according to user commands, such as adding corresponding information to information according to the user commands.
- the terminal devices 1001 , 1002 , 1003 can be hardware or software.
- terminal devices 1001 , 1002 , 1003 When terminal devices 1001 , 1002 , 1003 are hardware, they can be various electronic devices with a display screen and support web browsing. Including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 (Moving Picture Experts Group Audio Layer IV) players, laptop computers and desktop computers, etc.
- the terminal devices 1001 , 1002 , 1003 are software, they can be installed in the electronic devices listed above. It can be implemented as a plurality of software or software modules (such as software or software modules used to provide distributed services), or it can be implemented as a single software or software module. No specific restrictions are made herein.
- the server 1005 can be a server that provides various services, such as receiving a information acquisition request sent by the terminal devices 1001 , 1002 , 1003 , and obtaining the display information corresponding to the information acquisition request through various methods according to the information acquisition request. And the display information related data is sent to the terminal devices 1001 , 1002 , 1003 .
- the information processing method provided by the embodiment of the present disclosure may be executed by the terminal devices, and correspondingly, an information processing apparatus may be set in the terminal devices 1001 , 1002 , 1003 .
- the information processing method provided by the embodiment of the present disclosure may further be performed by the server 1005 , and correspondingly, the information processing apparatus may be set in the server 1005 .
- terminal devices, networks, and servers shown in FIG. 10 are only illustrative. Depending on the implementation, there can be any number of terminal devices, networks, and servers.
- FIG. 11 shows a schematic diagram of the construction of an electronic device (e.g., a terminal device or server in FIG. 10 ) suitable for implementing the embodiment of the present disclosure.
- the terminal devices in the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), in-vehicle terminals (e.g. in-vehicle navigation terminals), etc., and fixed terminals such as digital TV, desktop computers, etc.
- the electronic device shown in FIG. 11 is only an example and should not impose any limitations on the functionality and scope of use of the embodiment of the present disclosure.
- electronic devices may include a processing apparatus (e.g. a central processing unit, a graphics processor, etc.) 1101 that can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 1102 or loaded from storage device 1108 into random access memory (RAM) 1103 .
- ROM read-only memory
- RAM random access memory
- various programs and data required for the operation of the electronic device 1100 are also stored.
- the processing apparatus 1101 , ROM 1102 , and RAM 1103 are connected to each other via bus 1104 .
- the input/output (I/O) interface 1105 is also connected to bus 1104 .
- the following apparatuses can be connected to the I/O interface 1105 ; including, for example, an input apparatus 1106 , such as a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphones, an accelerometer, a gyroscope, etc.; including, for example, an output apparatus 1107 , such as a liquid crystal display (LCD), a speaker, a vibrator, etc.; including, for example, a storage apparatus 1108 , such as a magnetic tape, a hard disk, etc.; and communication apparatuses 1109 .
- the communication apparatuses 1109 may allow the electronic devices to communicate wirelessly or wired with other devices to exchange data.
- FIG. 11 shows the electronic devices with a variety of apparatuses, it should be understood that it is not required to implement or have all of the apparatuses shown. More or fewer devices may be implemented or had instead.
- the process described in the reference to flow chat above may be implemented as a computer software program.
- the embodiments of the present disclosure include a computer program product that includes computer programs carried on a non-transitory computer-readable medium, which contains program code for performing methods shown in the flow chat.
- the computer program may be downloaded and installed from the network via the communication apparatus 1109 , or installed from the storage apparatus 1108 , or installed from the ROM 1102 .
- the computer programs when executed by the processing apparatus 1101 , performs the above functions defined in the methods of the embodiments of this disclosure.
- the computer readable media mentioned in this disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
- a computer readable storage medium may be, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fibers, portable compact disk Read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores programs that may be used by or in combination with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier, which carries computer-readable program code. Such propagated data signal may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
- a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that may send, propagate or transmit programs intended for use by or in combination with an instruction executing system, apparatus or device.
- the program code contained on the computer readable medium may be transmitted in any appropriate medium, including but not limited to electrical wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
- a client or a server may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with digital data communication in any form or medium (e.g., communication network).
- HTTP HyperText Transfer Protocol
- Examples of the communication networks include local area networks (“Lans”), wide area networks (“Wans”), internets (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or future developed networks.
- the above-mentioned computer readable medium may be included in the above-mentioned electronic device; or it may stand alone and not be incorporated into that electronic device.
- the computer readable medium carries one or more programs which, when executed by that electronic device, cause that electronic device to; for a target information node in an information sequence and based on first node information of the target information node, determine a first demarcation point probability value, wherein the first demarcation point probability value indicates a probability that the target information node is a first type of demarcation point; determine, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value, wherein the second demarcation point probability value indicates a probability that the information node is a second type of demarcation point; and determine, based on the first demarcation point probability value and the second demarcation point probability value, at least two segmentation modes for the information sequence, wherein segmentation granularities of different segmentation modes are different.
- the electronic device is further used to; obtain third node information of the target information node; determine a third demarcation point probability value based on the third node information and at least one of the first demarcation point probability value and the second demarcation point probability value, wherein the third demarcation point probability value is used to indicate a probability that the target information node is a third type of demarcation point; and the determining, based on the first demarcation point probability value and the second demarcation point probability value, at least two segmentation modes for the information sequence further comprises; determining a segmentation mode corresponding to the third type of demarcation point.
- a plurality of different segmentation modes and a plurality of corresponding demarcation point probability values are further comprised, and a determination of each demarcation point probability value is related to at least one other demarcation point probability value.
- the information sequence comprises a video frame sequence, with the first type of demarcation point being an event demarcation point, the second type of demarcation point being a shot demarcation point, and the third type of demarcation point being a scene demarcation point.
- the determining a first demarcation point probability value comprises: importing, into a first cascaded classifier, information of event advanced features for the target information node, wherein the first cascaded classifier comprises at least two first classifiers, wherein advanced features comprise timing features and/or attention features; and generating the first demarcation point probability value based on confidences output by respective first classifiers in the first cascaded classifier.
- the first classifiers in the first cascaded classifier correspond to event advanced feature extraction networks one by one; and for a target information node in an information sequence and based on first node information of the target information node, the determining a first demarcation point probability value comprises: obtaining basic feature information of the target information node, wherein the basic feature information comprises visual feature information and/or audio feature information; importing the basic feature information into each of the event advanced feature extraction networks, to obtain the information of event advanced features, wherein advanced features comprise timing features and/or attention features; inputting respective ones of the information of event advanced features into the corresponding first classifiers, to obtain confidences output by the first classifiers; and determining, based on confidences output by the respective first classifiers, the first demarcation point probability value.
- a second cascaded classifier comprises at least two second classifiers, with the second classifiers in the second cascaded classifier corresponding to shot advanced feature extraction networks one by one; and the determining, based on the first demarcation point probability value and second node information of the target information node, a second demarcation point probability value comprises: importing the basic feature information into each of the shot advanced feature extraction networks, to obtain respective ones of information of shot advanced features; inputting the respective ones of information of shot advanced features and the first demarcation point probability value into the corresponding second classifier, to obtain confidences output by the second classifiers; and determining, based on confidences output by the respective second classifiers, the second demarcation point probability value.
- a third cascaded classifier comprises at least two third classifiers, with the third classifiers in the third cascaded classifier corresponding to scene advanced feature extraction networks one by one; and the electronic device is further used to; import the basic feature information into each of the scene advanced feature extraction networks, to obtain respective ones of information of scene advanced features; input, into the corresponding third classifier, the respective ones of the information of scene advanced features and at least one of the first demarcation point probability value and the second demarcation point probability value, to obtain confidences output by the third classifiers; and determine, based on the confidences output by the respective third classifiers, the third demarcation point probability value, wherein the third demarcation point probability value indicates a probability that the target information node is the event demarcation point.
- training steps of a cascaded classifier comprises: obtaining sample feature information of training samples, wherein the training samples are information nodes in a sample information sequence; importing the sample feature information into a target classifier in the cascaded classifier, wherein respective classifiers in the cascaded classifier correspond to error labels one by one, and an error indicated by an error label of a classifier at a higher level than that of the target classifier is smaller than an error indicated by an error label of the target classifier; in accordance with a determination that an output result of the target classifier is true, using a classifier with a higher level than the target classifier to process the sample node information; and in accordance with a determination that the output result of the target classifier is false, masking the training sample.
- the electronic device is further used to obtaining, from a video frame sequence, a preset number of consecutive video frames before the target information node and a preset number of consecutive video frames after the target information node, to obtain a video frame sub-sequence; and import, into a pre-trained basic feature extraction network, the video frame sub-sequence to obtain the basic feature information of the target information node.
- the basic feature extraction network comprises at least one of a visual feature extraction network and an audio feature extraction network; and the importing, into a pre-trained basic feature extraction network, the video frame sub-sequence to obtain the basic feature information of the target information node comprises: obtaining an audio sequence corresponding to the video frame sub-sequence; importing the video frame sub-sequence into the video feature extraction network, to obtain the visual feature information; importing the audio sequence into the audio feature extraction network, to obtain the audio feature information; and concatenating the visual feature information and the audio feature information to obtain the basic feature information.
- the advanced feature extraction networks comprise a timing network and/or an attention network; and corresponding information of advanced features is obtained by; in case that the advanced feature extraction networks comprise a timing network and an attention network, importing the basic feature information into the timing network and importing the basic feature information into the attention network; and concatenating an output of the timing extraction network and an output of the attention extraction network as the corresponding information of advanced features.
- the timing network comprise dilated convolutions.
- the attention network is constructed based on a multi-head attention mechanism.
- the information sequence is a text information sequence.
- Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, or combinations thereof, which comprise, but not limited to, object-oriented programming languages, such as Java, Smalltalk, C++; and conventional procedural programming languages, such as the “C” language or similar programming languages.
- the program code may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone package, partly on the user's computer and partly on a remote computer, or entirely on a remote computer or server.
- the remote computer may be connected to the user computer via any kind of network, including a local area network (LAN) or a wide area network (WAN), or it may be connected to an external computer (e.g., using an Internet service provider to connect via the Internet).
- LAN local area network
- WAN wide area network
- an Internet service provider to connect via the Internet
- each box in the flow chats or block diagrams may represent a module, program segment, or part of code that contains one or more executable instructions for implementing a specified logical function.
- the functions indicated in the boxes may also occur in a different order than those indicated in the attached drawings. For example, two boxes represented one after another can actually be executed in parallel, or they can sometimes be executed in reverse order, depending on the function involved.
- each of the boxes in the block diagrams and/or the flow charts, and the combination of the boxes in the block diagrams and/or the flow charts can be implemented with a dedicated hardware-based system that performs the specified function or operation, or with a combination of dedicated hardware and computer instructions.
- the involved units described in the embodiments of the present disclosure may be implemented either by means of software or by means of hardware.
- the names of the units herein does not constitute a qualification for the unit itself in a certain situation, for example, the first determination unit may also be described as “the unit that determines the first demarcation point probability value”.
- exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-chip (SOCs), complex programmable logic devices (CPLDs), and so on.
- FPGAs field programmable gate arrays
- ASICs application-specific integrated circuits
- ASSPs application-specific standard products
- SOCs system-on-chip
- CPLDs complex programmable logic devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction executing system, apparatus, or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- the machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses, or devices, or any suitable combination of the above.
- machine-readable storage medium More specific examples of the machine-readable storage medium would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.
- RAM random access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM compact disk read-only memory
- magnetic storage devices magnetic storage devices, or any suitable combination of the above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (19)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110716045.9A CN115527136A (en) | 2021-06-25 | 2021-06-25 | Information segmentation method and device and electronic equipment |
| CN202110716045.9 | 2021-06-25 | ||
| PCT/SG2022/050427 WO2022271100A2 (en) | 2021-06-25 | 2022-06-21 | Information segmentation method and apparatus, and electronic device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20240289312A1 US20240289312A1 (en) | 2024-08-29 |
| US12481631B2 true US12481631B2 (en) | 2025-11-25 |
Family
ID=84546005
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/572,606 Active US12481631B2 (en) | 2021-06-25 | 2022-06-21 | Information segmentation methods, apparatuses, and electronic devices |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12481631B2 (en) |
| CN (1) | CN115527136A (en) |
| WO (1) | WO2022271100A2 (en) |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102547139A (en) | 2010-12-30 | 2012-07-04 | 北京新岸线网络技术有限公司 | Method for splitting news video program, and method and system for cataloging news videos |
| US20160365121A1 (en) | 2015-06-11 | 2016-12-15 | David M. DeCaprio | Game Video Processing Systems and Methods |
| US20190147105A1 (en) * | 2017-11-15 | 2019-05-16 | Google Llc | Partitioning videos |
| CN110941594A (en) | 2019-12-16 | 2020-03-31 | 北京奇艺世纪科技有限公司 | Splitting method and device of video file, electronic equipment and storage medium |
| US20200401808A1 (en) | 2018-07-18 | 2020-12-24 | Tencent Technology (Shenzhen) Company Ltd | Method and device for identifying key time point of video, computer apparatus and storage medium |
| US20220067386A1 (en) * | 2020-08-27 | 2022-03-03 | International Business Machines Corporation | Deterministic learning video scene detection |
| US20220292830A1 (en) * | 2020-09-10 | 2022-09-15 | Adobe Inc. | Hierarchical segmentation based on voice-activity |
| US20220375225A1 (en) * | 2019-09-30 | 2022-11-24 | Beijing Wodong Tianjun Information Technology Co., Ltd. | Video Segmentation Method and Apparatus, Device, and Medium |
| US20230043769A1 (en) * | 2020-09-10 | 2023-02-09 | Adobe Inc. | Zoom and scroll bar for a video timeline |
| US11776273B1 (en) * | 2020-11-30 | 2023-10-03 | Amazon Technologies, Inc. | Ensemble of machine learning models for automatic scene change detection |
| US11869240B1 (en) * | 2020-12-11 | 2024-01-09 | Amazon Technologies, Inc. | Semantic video segmentation |
-
2021
- 2021-06-25 CN CN202110716045.9A patent/CN115527136A/en active Pending
-
2022
- 2022-06-21 WO PCT/SG2022/050427 patent/WO2022271100A2/en not_active Ceased
- 2022-06-21 US US18/572,606 patent/US12481631B2/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102547139A (en) | 2010-12-30 | 2012-07-04 | 北京新岸线网络技术有限公司 | Method for splitting news video program, and method and system for cataloging news videos |
| US20160365121A1 (en) | 2015-06-11 | 2016-12-15 | David M. DeCaprio | Game Video Processing Systems and Methods |
| US20190147105A1 (en) * | 2017-11-15 | 2019-05-16 | Google Llc | Partitioning videos |
| US20200401808A1 (en) | 2018-07-18 | 2020-12-24 | Tencent Technology (Shenzhen) Company Ltd | Method and device for identifying key time point of video, computer apparatus and storage medium |
| US20220375225A1 (en) * | 2019-09-30 | 2022-11-24 | Beijing Wodong Tianjun Information Technology Co., Ltd. | Video Segmentation Method and Apparatus, Device, and Medium |
| CN110941594A (en) | 2019-12-16 | 2020-03-31 | 北京奇艺世纪科技有限公司 | Splitting method and device of video file, electronic equipment and storage medium |
| US20220067386A1 (en) * | 2020-08-27 | 2022-03-03 | International Business Machines Corporation | Deterministic learning video scene detection |
| US20220292830A1 (en) * | 2020-09-10 | 2022-09-15 | Adobe Inc. | Hierarchical segmentation based on voice-activity |
| US20220301313A1 (en) * | 2020-09-10 | 2022-09-22 | Adobe Inc. | Hierarchical segmentation based software tool usage in a video |
| US20230043769A1 (en) * | 2020-09-10 | 2023-02-09 | Adobe Inc. | Zoom and scroll bar for a video timeline |
| US11776273B1 (en) * | 2020-11-30 | 2023-10-03 | Amazon Technologies, Inc. | Ensemble of machine learning models for automatic scene change detection |
| US11869240B1 (en) * | 2020-12-11 | 2024-01-09 | Amazon Technologies, Inc. | Semantic video segmentation |
Non-Patent Citations (2)
| Title |
|---|
| ISR for PCT application PCT/SG2022-050427 dated Jan. 3, 2023 (9 pages). |
| ISR for PCT application PCT/SG2022-050427 dated Jan. 3, 2023 (9 pages). |
Also Published As
| Publication number | Publication date |
|---|---|
| US20240289312A1 (en) | 2024-08-29 |
| CN115527136A (en) | 2022-12-27 |
| WO2022271100A3 (en) | 2023-02-16 |
| WO2022271100A2 (en) | 2022-12-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240233334A1 (en) | Multi-modal data retrieval method and apparatus, medium, and electronic device | |
| JP7394809B2 (en) | Methods, devices, electronic devices, media and computer programs for processing video | |
| KR20230004391A (en) | Method and apparatus for processing video, method and apparatus for querying video, training method and apparatus for video processing model, electronic device, storage medium, and computer program | |
| CN112364829B (en) | Face recognition method, device, equipment and storage medium | |
| US11847419B2 (en) | Human emotion detection | |
| CN113033682B (en) | Video classification method, device, readable medium, and electronic device | |
| CN113610034B (en) | Method, device, storage medium and electronic device for identifying human entities in video | |
| WO2023000782A1 (en) | Method and apparatus for acquiring video hotspot, readable medium, and electronic device | |
| CN115052188A (en) | Video editing method, device, equipment and medium | |
| CN113987258B (en) | Audio recognition method, device, readable medium and electronic device | |
| CN116503596A (en) | Image segmentation method, device, medium and electronic equipment | |
| CN111128131A (en) | Speech recognition method, apparatus, electronic device, and computer-readable storage medium | |
| US12481631B2 (en) | Information segmentation methods, apparatuses, and electronic devices | |
| CN111666449B (en) | Video retrieval method, apparatus, electronic device, and computer-readable medium | |
| US20240330769A1 (en) | Object processing method, device, readable medium and electronic device | |
| CN112650830A (en) | Keyword extraction method and device, electronic equipment and storage medium | |
| CN117253334A (en) | Smoke and fire early warning methods, devices and equipment for electric vehicle charging stations | |
| CN118967079A (en) | Comprehensive analysis and management method and system for personnel training archives based on big data | |
| CN117196333A (en) | Method and device for generating natural disaster impact and loss information based on power data | |
| CN117034959A (en) | Data processing method, device, electronic equipment and storage medium | |
| CN115577137A (en) | A label determination method, device, equipment and medium | |
| CN115273148A (en) | Pedestrian re-recognition model training method and device, electronic equipment and storage medium | |
| CN116416981B (en) | Keyword detection method, device, electronic device and storage medium | |
| CN114581714B (en) | Image-based subject identification method and device, storage medium and electronic equipment | |
| CN114581713B (en) | Image-based subject identification method and device, storage medium and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| AS | Assignment |
Owner name: BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LI, CONGCONG;REEL/FRAME:071612/0700 Effective date: 20250509 Owner name: LEMON INC., CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BYTEDANCE INC.;REEL/FRAME:071612/0703 Effective date: 20250609 Owner name: LEMON INC., CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHANGHAI SUIXUNTONG ELECTRONIC TECHNOLOGY CO., LTD.;REEL/FRAME:071612/0707 Effective date: 20250609 Owner name: LEMON INC., CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.;REEL/FRAME:071612/0711 Effective date: 20250609 Owner name: SHANGHAI SUIXUNTONG ELECTRONIC TECHNOLOGY CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HONG, DEXIANG;REEL/FRAME:071612/0694 Effective date: 20250509 Owner name: BYTEDANCE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WEN, LONGYIN;WANG, XINYAO;SIGNING DATES FROM 20250509 TO 20250702;REEL/FRAME:071612/0691 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |