CN116193149A - Live broadcast data processing method, device, equipment and computer readable storage medium - Google Patents

Live broadcast data processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN116193149A
CN116193149A CN202111447765.6A CN202111447765A CN116193149A CN 116193149 A CN116193149 A CN 116193149A CN 202111447765 A CN202111447765 A CN 202111447765A CN 116193149 A CN116193149 A CN 116193149A
Authority
CN
China
Prior art keywords
live
live broadcast
data stream
broadcast data
data streams
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111447765.6A
Other languages
Chinese (zh)
Inventor
陈俊祥
阮明康
张国泽
符帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111447765.6A priority Critical patent/CN116193149A/en
Publication of CN116193149A publication Critical patent/CN116193149A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application provides a live broadcast data processing method, a live broadcast data processing device, live broadcast data processing equipment and a computer readable storage medium; the method comprises the following steps: acquiring a plurality of live broadcast data streams uploaded by different terminals, and determining at least two target video frames corresponding to each live broadcast data stream; combining at least two target video frames corresponding to each live broadcast data stream to obtain combined images corresponding to each live broadcast data stream; determining local feature information of the combined image corresponding to each live data stream, and determining the similarity between the live data streams based on the local feature information corresponding to each live data stream; and determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams. According to the method and the device, the identification efficiency of the similar live broadcast data stream can be improved on the premise of ensuring the accuracy.

Description

Live broadcast data processing method, device, equipment and computer readable storage medium
Technical Field
The present disclosure relates to internet technologies, and in particular, to a live broadcast data processing method, apparatus, device, and computer readable storage medium.
Background
Live broadcast is a way for many users to share life, communicate knowledge. In the service operation process, the situation that a plurality of matrix account numbers rebroadcast the same content exists (the situation can be called as a homogeneous live broadcast room) is found, the ecology of the live broadcast of the video number is damaged to a certain extent, and the user experience of the live broadcast of the video number is seriously affected. In order to create a good living broadcast ecological environment for users, detection and filtering treatment are required to be carried out on a homogeneous living broadcast room. The live broadcast data processing method for detecting and filtering the homogeneous live broadcast room in the related technology has large calculation complexity and is not suitable for scenes with non-aligned time; or the time cost and the economic cost for designing and deploying the neural network model required by live data processing are high, and the real-time detection requirement cannot be met.
Disclosure of Invention
The embodiment of the application provides a live broadcast data processing method, a live broadcast data processing device and a computer readable storage medium, which can improve the identification efficiency of similar live broadcast data streams on the premise of ensuring the accuracy.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a live broadcast data processing method, which comprises the following steps:
Acquiring a plurality of live broadcast data streams uploaded by different terminals, and determining at least two target video frames corresponding to each live broadcast data stream;
combining at least two target video frames corresponding to each live broadcast data stream to obtain combined images corresponding to each live broadcast data stream;
determining local feature information of the combined image corresponding to each live data stream, and determining the similarity between the live data streams based on the local feature information corresponding to each live data stream;
and determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams.
The embodiment of the application provides a live broadcast data processing device, which comprises: .
The first determining module is used for acquiring a plurality of live broadcast data streams uploaded by different terminals and determining at least two target video frames corresponding to each live broadcast data stream;
the image merging module is used for merging at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream;
the second determining module is used for determining local characteristic information of the combined image corresponding to each live broadcast data stream and determining the similarity between the live broadcast data streams based on the local characteristic information corresponding to each live broadcast data stream;
And the third determining module is used for determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams.
In some embodiments, the first determination module is further to:
decoding each live broadcast data stream to obtain a plurality of live broadcast video frames corresponding to each live broadcast data stream;
performing frame extraction processing from a plurality of live video frames corresponding to each live data stream according to a preset interval duration to obtain a plurality of extracted live video frames corresponding to each live data stream;
randomly acquiring a preset number of live video frames from a plurality of extracted live video frames corresponding to each live data stream to obtain at least two target video frames corresponding to each live data stream.
In some embodiments, the image merging module is further configured to:
acquiring a preset image target size, and adjusting the sizes of at least two target video frames corresponding to each live broadcast data stream based on the image target size to obtain at least two adjusted target video frames corresponding to each live broadcast data stream;
and combining the at least two adjusted target video frames corresponding to each live broadcast data stream according to the time sequence of the at least two adjusted target video frames corresponding to each live broadcast data stream to obtain combined images corresponding to each live broadcast data stream.
In some embodiments, the second determining module is further configured to:
carrying out feature extraction on each combined image according to a preset local feature extraction algorithm to obtain feature vectors of each combined image;
determining a feature threshold of each combined image based on the feature vector of each combined image;
and carrying out binarization processing on the corresponding feature vectors based on the feature threshold values of the combined images to obtain local feature information of the combined images.
In some embodiments, the second determining module is further configured to:
determining the same bit number in the local feature information of different live broadcast data streams based on the local feature information corresponding to each live broadcast data stream;
and determining the similarity between different live broadcast data streams based on the total bit number of the local characteristic information and the same bit number.
In some embodiments, the third determining module is further configured to:
acquiring a preset similarity threshold;
determining at least two live data streams with similarity greater than or equal to the similarity threshold as at least two live data streams meeting a similarity condition;
the apparatus further comprises:
a fourth determining module, configured to determine a target live data stream that needs to be subjected to duplication elimination processing from the at least two live data streams that meet a similarity condition;
The duplication elimination processing module is used for carrying out duplication elimination processing on the target live broadcast data stream to obtain a processed live broadcast data stream;
and the sending module is used for determining live broadcast data to be sent based on the processed live broadcast data stream and sending the live broadcast data to be sent to the audience terminal.
In some embodiments, the fourth determination module is further configured to:
determining the number of viewers in the live broadcasting room corresponding to the at least two live broadcasting data streams meeting the similarity condition;
and determining other live broadcast data streams except for the most number of viewers in the at least two live broadcast data streams as target live broadcast data streams.
In some embodiments, the fourth determination module is further configured to:
acquiring reporting times of at least two live broadcast data streams meeting the similarity condition corresponding to a live broadcast room;
and when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is larger than a difference value threshold, determining the other live broadcast data streams with the least reporting times as target live broadcast data streams.
In some embodiments, the fourth determination module is further configured to:
when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is smaller than or equal to a difference value threshold value, responding to a live broadcast request sent by the audience terminal, and acquiring historical behavior data corresponding to the audience terminal;
Determining viewing preference information corresponding to the audience terminal based on the historical behavior data;
and determining other live broadcast data streams with highest matching degree with the viewing preference information among the at least two live broadcast data streams meeting the similarity condition as target live broadcast data streams based on the viewing preference information.
In some embodiments, the de-duplication processing module is further configured to:
deleting the target live data stream from the plurality of live data streams to obtain a processed live data stream; or alternatively;
and carrying out flow weight reduction processing on the target live broadcast data stream to obtain a processed live broadcast data stream.
An embodiment of the present application provides a computer device, including:
a memory for storing executable instructions;
and the processor is used for realizing the live broadcast data processing method provided by the embodiment of the application when executing the executable instructions stored in the memory.
The embodiment of the application provides a computer readable storage medium, which stores executable instructions for implementing the live broadcast data processing method provided by the embodiment of the application when being executed by a processor.
Embodiments of the present application provide a computer program product, including a computer program or instructions, which when executed by a processor implement a live data processing method provided by embodiments of the present application.
The embodiment of the application has the following beneficial effects:
after acquiring a plurality of live broadcast data streams uploaded by different terminals, a server determines at least two target video frames corresponding to each live broadcast data stream, then performs merging processing on the at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream, determines local characteristic information of the merged images corresponding to each live broadcast data stream, determines similarity between each live broadcast data stream based on the local characteristic information corresponding to each live broadcast data stream, and finally determines at least two live broadcast data streams meeting a similarity condition based on the similarity between each live broadcast data stream; therefore, as the merging processing is carried out after at least two target video frames are obtained and the local feature information of the merged image is extracted, even if the arrangement sequence of each target video frame in the merged image is different, the same local feature information can be obtained, so that accurate similarity calculation can be ensured for videos with non-aligned time, and the local feature information of different merged images is calculated without considering the similarity of semantic features during the similarity calculation, thereby reducing the calculation complexity and improving the recognition efficiency of similar live broadcast data streams.
Drawings
Fig. 1 is a network architecture schematic diagram of a live broadcast system 100 according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server 400 according to an embodiment of the present application;
fig. 3 is a schematic flowchart of an implementation of a live broadcast data processing method according to an embodiment of the present application;
fig. 4 is a schematic implementation flow chart for determining similarity of each live data stream according to an embodiment of the present application;
fig. 5 is a schematic flow chart of another implementation of the live broadcast data processing method provided in the embodiment of the present application;
fig. 6 is a schematic diagram of a live interface provided in an embodiment of the present application;
fig. 7A is a diagram of nine Gong Geda corresponding to the live broadcast room A1;
fig. 7B is a nine Gong Geda diagram corresponding to the live room A2;
fig. 8A is a nine Gong Geda diagram corresponding to the live broadcast room B1;
fig. 8B is a nine Gong Geda view corresponding to the live room B2.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
Before further describing embodiments of the present application in detail, the terms and expressions that are referred to in the embodiments of the present application are described, and are suitable for the following explanation.
1) Hash computation is the transformation of an arbitrary length input into a fixed length output by a hash algorithm.
2) And the homogeneous live broadcast refers to live broadcast content with the same or similar audio and video.
3) Unsupervised learning learns from unlabeled, classified or categorized test data.
4) The multi-mode model is characterized in that a model is built by multiple modes, and information of the multiple modes can be processed and associated.
5) Time domain plot, x-axis time, y-axis amplitude; frequency domain diagram: the x-axis is frequency and the y-axis is amplitude
6) The basic principle of the Perceptual Hash (pHash) algorithm is that a picture is reduced to a range which can be calculated, then main features of the picture are filtered through a discrete cosine transform (DCT, discrete Cosine Transform) algorithm, data which can reflect the features of the picture to a certain extent are obtained, and finally the Hash value of the picture is output.
In order to better understand the live broadcast data processing method for detecting the homogeneous live broadcast provided by the embodiment of the application, the live broadcast data processing method for detecting the homogeneous live broadcast and the defects existing in the live broadcast data processing method in the related art are described.
The live broadcast data processing method for detecting the homogeneous live broadcast in the related technology mainly comprises the following two steps:
In the first scheme, video pairs are extracted from a large number of videos, a plurality of small-segment videos are sampled, and frame-by-frame similarity judgment is carried out according to the judgment basis including image color space, audio frequency spectrum, amplitude and the like. In the live broadcast scene, the video stream is stored as a video with a fixed length at fixed intervals, and then the frame-by-frame comparison is carried out.
Scheme two, because most of the homogeneous content can be simply processed, such as watermarking, head-to-tail, etc., the video content is generally not aligned. In order to reduce the calculation cost and reduce the influence of content misalignment on the precision, another existing scheme is to obtain a semantic fingerprint vector, usually a floating point vector with a certain length, of a video through a multi-mode semantic retrieval model aiming at a video segment. And then, a homogeneous video group is found through a fasss vector retrieval engine and the like, and the video content in the group is judged to be homogeneous.
Scheme II can be realized by the following procedures:
and step S001, after the user opens the program, the live data stream is sent to the server.
And step S002, the server obtains the representation vector of the live broadcast content through the multi-mode model.
In step S003, the server searches and compares the representation vector with all other live vectors.
And S004, if the similarity of the live broadcast vector and other live broadcast vectors is too high, judging that the live broadcast content is homogeneous.
The following describes the drawbacks of the first and second embodiments.
In the first scheme, although the recognition accuracy is highest, the calculation complexity is high, the video content is required to be completely aligned, no time-dimensional deviation exists, otherwise, even if the video has a deviation of more than 1s, the frame pair is possibly caused, and the video considered to be the same by naked eyes is judged to be different.
Scheme II essentially utilizes a similar video retrieval technique to solve the problem of homogeneous content discovery. The main disadvantages of this solution are: (1) The accuracy is somewhat poor and the semantic retrieval model needs to be updated periodically to accommodate the changes in content distribution. The multi-modal model is trained for a long time, the cost of obtaining the model is excessive, the time cost and the economic cost of the model from training data preparation to model design and deployment on a server are excessive, and the period often needs more than half a month. (2) Meanwhile, the consumption and delay of the multi-mode semantic search model and the fass vector search engine on resources are very large, and the cost for calculating cosine distances or L1 and L2 distances among semantic fingerprint vectors is higher than that for calculating Hamming distances by binary hashes.
Based on the above, the embodiment of the application provides a live broadcast data processing method, which is used for identifying and detecting a homogeneous live broadcast, and local feature extraction is performed in a sampling mode instead of a frame-by-frame computing mode during implementation, so as to obtain binary fingerprint information, and the similarity of a live broadcast data stream is determined through the binary fingerprint information, so that the identification of the homogeneous live broadcast is realized.
An exemplary application of the computer device provided by the embodiments of the present application is described below, where the computer device provided by the embodiments of the present application may be implemented as a server. In the following, an exemplary application when the device is implemented as a server will be described.
Referring to fig. 1, fig. 1 is a schematic architecture diagram of a live broadcast system 100 provided in an embodiment of the present application, as shown in fig. 1, the live broadcast system 100 includes a hosting terminal 200 (an hosting terminal 200-1 and a hosting terminal 200-2 are exemplarily shown), a network 300, a server 400, and an audience terminal 500, where the hosting terminal 200-1, the hosting terminal 200-2, and the audience terminal 500 are connected to the server 400 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
An App (application) capable of watching or listening to a live broadcast may be installed in the viewer terminal 400, and the App may be a specific live broadcast App, or may be some apps with live broadcast functions, for example, may be a small video App. The user may present a live room entrance interface through the App, and when the viewer terminal 500 receives a touch operation for a certain live room entrance, enter the live room, and watch or listen to live content.
A live App may be installed in the anchor terminal 200-1 and the anchor terminal 200-2, and after the anchor terminal 201 and the anchor terminal 200-2 start to broadcast, the live data stream may be sent to the server 400, and the server 400 receives a massive live data stream.
In the embodiment of the present application, the server 400 may be a single server, or may be a server cluster formed by multiple servers, a cloud computing center, or the like, and there are various different deployment manners of the server 400 according to the implementation manner of the live broadcast service in the audience terminal 500.
For example, when the live service is implemented in the form of a dedicated live APP in the viewer terminal 500, the server 400 may be a dedicated server or servers providing live video that communicate directly with the viewer terminal 500 over the network 300 to accomplish the necessary transmission of data and information.
For another example, when the live service is implemented in the viewer terminal 500 as a module or plug-in (e.g., applet) coupled to various existing APPs (e.g., social APPs, shopping APPs), the server 400 may include a business server for implementing basic business functions of the existing APPs, and a live server for providing live video, the live server being in communication with the module or plug-in directly or indirectly through the business server; it will of course be appreciated that the live server and the service server differ mainly in the service logic carried, and thus the live server and the service server may in fact be the same server.
In the following description, for convenience of description, the servers of the above possible manners are collectively referred to as servers, and thus the server 400 should not be simply understood as one or a class of servers, but may be various possible forms of servers deployed for supporting live services in practical applications according to the above examples.
In some embodiments, the server 400 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The anchor terminal 200 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart car device, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiments of the present application.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 400 provided in an embodiment of the present application, and the server 400 shown in fig. 2 includes: at least one processor 410, at least one network interface 420, a bus system 430, and a memory 440. The various components in server 400 are coupled together by bus system 430. It is understood that bus system 430 is used to enable connected communications between these components. The bus system 430 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 430.
The processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
Memory 440 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 440 optionally includes one or more storage devices physically remote from processor 410.
Memory 440 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (RAM, random Access Memor y). The memory 440 described in embodiments of the present application is intended to comprise any suitable type of memory.
In some embodiments, memory 440 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 441 including system programs, e.g., a framework layer, a core library layer, a driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;
network communication module 442 for reaching other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.;
in some embodiments, the apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 shows a live data processing apparatus 443 stored in a memory 440, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the first determination module 4431, the image combining module 4432, the second determination module 4433, and the third determination module 4434 are logical, and thus may be arbitrarily combined or further split according to the implemented functions. The functions of the respective modules will be described hereinafter.
In other embodiments, the apparatus provided by the embodiments of the present application may be implemented in hardware, and by way of example, the apparatus provided by the embodiments of the present application may be a processor in the form of a hardware decoding processor that is programmed to perform the live data processing method provided by the embodiments of the present application, e.g., the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLD, programmable Logi c Device), complex programmable logic devices (CPLD, complex Programmable Logic Devi ce), field programmable gate arrays (FPGA, field-Programmable Gate Array), or other electronic components.
The live broadcast data processing method provided by the embodiment of the application will be described with reference to an exemplary application and implementation of the terminal provided by the embodiment of the application, and the live broadcast data processing method can be implemented by a server. Referring to fig. 3, fig. 3 is a schematic flowchart of an implementation of the live broadcast data processing method provided in the embodiment of the present application, and the steps shown in the live broadcast data processing provided in the embodiment of the present application will be described with reference to fig. 3.
Step S101, a plurality of live broadcast data streams uploaded by different terminals are obtained, and at least two target video frames corresponding to each live broadcast data stream are determined.
The different terminals here correspond to anchor terminals in other embodiments. When the step is realized, the server decodes each obtained live broadcast data stream to obtain a plurality of live broadcast video frames corresponding to each live broadcast data stream, and then performs frame extraction once every preset time interval, for example, frame extraction once every 5 seconds, and then randomly extracts a preset number of target video frames from the plurality of live broadcast video frames extracted in the preset time interval. For example, 24 live video frames extracted every two minutes can be extracted, and then 9 determined target video frames are extracted from the 24 live video frames; or extracting 12 live video frames from every 1 minute, and then extracting 4 determined target video frames from 2 live video frames.
Step S102, merging at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream.
When the step is realized, at least two target video frames can be combined according to the time sequence, or the combination processing is not carried out according to the time sequence, so that the combined image corresponding to each live broadcast data stream can be obtained. That is, each live data stream corresponds to a merged image at a corresponding similarity identification duration.
Step S103, determining local feature information of the combined image corresponding to each live broadcast data stream, and determining the similarity between the live broadcast data streams based on the local feature information corresponding to each live broadcast data stream.
In this embodiment of the present application, feature extraction may be performed on the combined image corresponding to each live broadcast data stream by using a preset local feature extraction algorithm, so as to obtain local feature information. In practice, the local feature information is a binary representation, which in some embodiments may also be referred to as binary fingerprint information.
Because the local feature information is binary representation, when the similarity between each live data stream is determined, the two local feature information can be subjected to bitwise and operation to determine the same number of bits between the two local feature information, and then the same number of bits is divided by the total number of bits of the local feature information, so that the similarity between the two local feature information is obtained, and the similarity is a real number between 0 and 1. In the embodiment of the application, the similarity between two live data streams is calculated.
Step S104, determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams.
When the step is implemented, a preset similarity threshold value can be firstly obtained, and at least two live data streams with similarity greater than or equal to the similarity threshold value are determined to be at least two live data streams meeting the similarity condition.
Because the similarity between the live data streams is determined in step S103, when the step is implemented, if there is a live data stream A, B, C, if the similarity between the live data stream a and the live data stream B is greater than or equal to the similarity threshold, the similarity between the live data stream B and the live data stream C is greater than or equal to the similarity threshold, only two live data streams satisfying the similarity condition can be obtained from the live data stream a and the live data stream B, two live data streams satisfying the similarity condition can be obtained from the live data stream B and the live data stream C, but not necessarily two live data streams satisfying the similarity condition can be obtained from the live data stream a and the live data stream C, and only three live data streams satisfying the similarity condition can be obtained from the live data stream A, B, C when the similarity between the live data stream a and the live data stream C is also greater than or equal to the similarity threshold.
In the live broadcast data processing method provided by the embodiment of the application, after a plurality of live broadcast data streams uploaded by different terminals are acquired, a server determines at least two target video frames corresponding to each live broadcast data stream, then performs merging processing on the at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream, determines local feature information of the merged images corresponding to each live broadcast data stream, determines similarity between each live broadcast data stream based on the local feature information corresponding to each live broadcast data stream, and finally determines at least two live broadcast data streams meeting a similarity condition based on the similarity between each live broadcast data stream; therefore, as the merging processing is carried out after at least two target video frames are obtained and the local feature information of the merged image is extracted, even if the arrangement sequence of each target video frame in the merged image is different, the same local feature information can be obtained, so that accurate similarity calculation can be ensured for videos with non-aligned time, and the local feature information of different merged images is calculated without considering the similarity of semantic features during the similarity calculation, thereby reducing the calculation complexity and improving the recognition efficiency of similar live broadcast data streams.
In some embodiments, "determining at least two target video frames corresponding to each live data stream" in step S101 may be implemented by:
and step S1011, decoding each live broadcast data stream to obtain a plurality of live broadcast video frames corresponding to each live broadcast data stream.
In order to improve the transmission efficiency of the live broadcast data and the requirement on transmission bandwidth, the live broadcast terminal sends the encoded and compressed live broadcast data stream to the server, so that the server needs to decode after obtaining the live broadcast data stream uploaded by each live broadcast terminal, and then a plurality of live broadcast video frames corresponding to each live broadcast data stream can be obtained. Whereas the plurality of live video frames are typically ordered in a time sequence.
Step S1012, performing frame extraction processing from the plurality of live video frames corresponding to each live data stream according to a preset interval duration, to obtain a plurality of extracted live video frames corresponding to each live data stream.
When the method is realized, the frame extraction can be executed once every interval time, and a plurality of frame extraction operations are carried out within the preset time so as to obtain a plurality of extracted live video frames corresponding to each live data stream. For example, the interval duration may be 12 seconds, 6 seconds, 5 seconds, the preset duration may be 2 minutes, and 10, 20, 24 live video frames are extracted every two minutes.
Step S1013, randomly acquiring a preset number of live video frames from the plurality of extracted live video frames corresponding to each live data stream, to obtain at least two target video frames corresponding to each live data stream.
In this step, when the number of live video frames extracted in step S1012 is relatively large, a preset number of live video frames may also be randomly obtained from the extracted multiple live video frames, so as to obtain at least two target video frames corresponding to each live data stream.
In some embodiments, if the interval duration employed in the frame extraction process at step S1012 is relatively long, for example, 20 seconds, then 6 live video frames are extracted every two minutes, at which time the 6 live video frames may be determined to be target video frames.
In some embodiments, the step S102 "merging the at least two target video frames corresponding to each live data stream to obtain a merged image corresponding to each live data stream" may be implemented by the following steps:
step S1021, obtaining a preset image target size, and performing size adjustment on at least two target video frames corresponding to each live broadcast data stream based on the image target size, so as to obtain at least two adjusted target video frames corresponding to each live broadcast data stream.
The image target size may be a target size of a single live video frame, for example, may be 360×240, and the size of at least two target video frames corresponding to each live data stream may be adjusted based on the image target size, or the size of each target video frame may be reduced or enlarged, so as to correspondingly obtain each adjusted target video frame, where the size information of the adjusted target video frame is the image target size.
Step S1022, merging the at least two adjusted target video frames corresponding to each live data stream according to the time sequence of the at least two adjusted target video frames corresponding to each live data stream, so as to obtain merged images corresponding to each live data stream.
In this embodiment of the present application, the merging processing is performed on at least two adjusted target video frames corresponding to each live broadcast data stream according to a time sequence, and when implementing, the merging processing may be performed on at least two adjusted target video frames, so as to obtain a merged image corresponding to each live broadcast data stream.
In the embodiment of steps S1021 to S1022, since the sizes of the target video frames are adjusted based on the target sizes of the images, even if the sizes of the live video frames analyzed by the live data streams uploaded by different hosting terminals are different, it is ensured that the merged images with the same size are obtained.
In some embodiments, as shown in fig. 4, the step S103 "of determining local feature information of the combined image corresponding to each live data stream, and determining the similarity between each live data stream based on the local feature information corresponding to each live data stream" may be implemented through steps S1031 to S1035, and each step is described below in connection with fig. 4.
Step S1031, extracting features of each combined image according to a preset local feature extraction algorithm to obtain feature vectors of each combined image.
The preset local feature extraction algorithm may be a perceptual hash algorithm, a SHIFT (SHIFT) algorithm, a histogram of direction gradients (HOG, histogram of Oriented Gradient) algorithm, and the like.
In the embodiment of the present application, the implementation process of this step will be described by taking the local feature extraction algorithm as an example of a perceptual hash algorithm. Firstly, the size of the combined image is reduced, for example, the image is reduced to 8 x 8, the total pixels are 64, so that details of the image are removed, only basic information such as structures, brightness and the like is reserved, image differences brought by different sizes and proportions are abandoned, and then the reduced image is converted into 64-level gray scale. That is, all pixels have only 64 colors in total; and calculates a gray average value of all 64 pixels, in the embodiment of the present application, the gray average value may be determined as the feature value.
Step S1032, determining a feature threshold of each combined image based on the feature vector of each combined image.
When the step is realized, the method can be that the characteristic values in each characteristic vector are calculated by arithmetic average to obtain the average value corresponding to each characteristic vector, and the average value is determined as the characteristic threshold value of each combined image; the feature values in the feature vectors may be ranked to obtain a ranking result, then a feature median value is determined based on the ranking result, and then the feature median value is determined as a feature threshold value.
Step S1033, performing binarization processing on the corresponding feature vectors based on the feature threshold of each combined image, to obtain local feature information of each combined image.
In implementation, the feature value larger than the feature threshold in the feature vector is set as 1, and the feature value smaller than or equal to the feature threshold is set as 0, so that binarization processing of the feature vector is realized, and local feature information of each combined image is obtained.
Step S1034, determining the same number of bits in the local feature information of different live data streams based on the local feature information corresponding to each live data stream.
When the step is realized, the same bit number in the local characteristic information of different live broadcast data streams can be determined through bitwise exclusive OR operation.
Step S1035, determining the similarity between the different live data streams based on the total number of bits of the local feature information and the same number of bits.
Here, the same number of bits may be divided by the total number of bits to obtain the similarity between different live data streams. I.e. the more the same number of bits between different feature vectors, the higher the similarity between live data streams.
In the embodiment of steps S1031 to S1035, the local feature information of the merged image is extracted, so that even if the sequence of each live video frame in the merged image is different, the local feature information generated finally is not affected, and thus the problem of time alignment can be solved.
Based on the foregoing embodiments, the embodiments of the present application further provide a live broadcast data processing method, which is applied to the network architecture shown in fig. 1, and fig. 5 is a schematic flow chart of another implementation of the live broadcast data processing method provided in the embodiments of the present application, as shown in fig. 5, where the flow includes:
In step S501, the anchor terminal responds to the operation instruction of starting the live App, presents the live window of the live service, and receives the setting of the anchor user for the live service to be initialized.
In the embodiment of the application, the live broadcast window before the live broadcast service is not initialized is used for receiving information such as the name and remarks of the live broadcast service newly built in the live broadcast room of the host user, which is convenient for the host user to search for later.
Step S502, the anchor terminal sends live broadcast service initialization data to the server.
Here, the anchor terminal submits an identifier of a live broadcasting room to be established in the live broadcasting room, an identifier of an anchor user, and the like to the server for initializing the live broadcasting service.
In step S503, the anchor terminal responds to the start operation for the anchor terminal to start live broadcast, presents a live broadcast interface, and acquires media data to be uploaded.
Here, when live video is started, the media data includes image data and audio data. The media data to be uploaded in this step may be acquired in real time by the image acquisition device of the anchor terminal, or may be transmitted to the anchor terminal from another device having a communication connection with the anchor terminal. For example, news feeds, television series rebroadcasts, etc.
Step S504, the host broadcasting terminal encodes the media data to be uploaded to obtain a live broadcasting data stream, and sends the live broadcasting data stream to the server.
In step S505, the server acquires a plurality of live data streams uploaded by different anchor terminals, and determines at least two target video frames corresponding to each live data stream.
Step S506, the server performs merging processing on at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream.
Step S507, the server determines local feature information of the merged image corresponding to each live data stream, and determines a similarity between each live data stream based on the local feature information corresponding to each live data stream.
In step S508, the server determines at least two live data streams satisfying the similarity condition based on the similarity between the live data streams.
Note that the implementation procedures of the steps S505 to S508 are similar to the implementation procedures of the steps S101 to S104, and reference may be made to the implementation procedures of the steps S101 to S104 when implemented.
In step S509, the server determines a target live data stream that needs to be subjected to duplication elimination processing from the at least two live data streams that satisfy the similarity condition.
When the step is implemented, the step can be implemented based on the number of viewers, the number of times of reporting, historical behavior data of the user and the like as a duplication eliminating filtering standard, for example, live broadcast data streams with small number of viewers and live broadcast data streams with large number of times of reporting and unmatched with user preference information can be determined as target live broadcast data streams.
And step S510, the server performs duplication elimination processing on the target live broadcast data stream to obtain a processed live broadcast data stream.
When the step is realized, the target live data stream can be deleted from the plurality of live data streams, and the processed live data stream is obtained; or alternatively; and performing flow weight reduction processing on the target live broadcast data stream to obtain a processed live broadcast data stream. In some embodiments, a sink penalty may also be applied to the target live data stream.
In step S511, the viewer terminal initializes the client based on the operation instruction to start the live client, and initializes the player parameters.
Here, the player program in the viewer terminal is operated in a single instance, that is, the player program is operated at all times and is not stopped as long as the client is in an operating state. When initializing the client, and initializing player parameters.
In step S512, the viewer terminal transmits a live data acquisition request to the server.
Here, the live data acquisition request is at least used for requesting to acquire interface data of the live broadcasting room, that is, interface data of a home page of the live broadcasting room.
In step S513, the server determines, based on the processed live data stream, live data to be sent.
When the step is realized, after the server acquires the data acquisition request, based on the user identification corresponding to the audience terminal, acquiring user historical viewing information, attention anchor information and the like, further determining the anchor watched by the user and the live broadcast data stream similar to the user historical viewing information from the obtained processed live broadcast data stream, and determining the home page interface data of the screened live broadcast data stream as live broadcast data to be transmitted.
Step S514, the server sends the live broadcast data to be sent to the audience terminal.
Step S515, the viewer terminal presents the home interface of each living room after receiving the live data to be transmitted.
Because the live data to be transmitted to the audience terminal is determined based on the live data stream obtained after the duplication elimination processing, the live data to be transmitted does not comprise a live broadcasting room home page interface for playing the repeated live data stream.
In step S516, the viewer terminal sends a live data stream acquisition request to the server in response to the touch operation for the home interface of the target live room.
The live broadcast data stream acquisition request carries an identifier of the target live broadcast room and is used for requesting to acquire the live broadcast data stream of the target live broadcast room.
Step S517, the server acquires the live broadcast data stream corresponding to the target live broadcast room and sends the live broadcast data stream to the audience terminal.
In step S518, the viewer terminal performs live broadcast based on the live broadcast data stream.
In the live broadcast data processing method provided by the embodiment of the application, after completing setting of a live broadcast service and starting live broadcast, a main broadcast terminal sends live broadcast data streams to a server, the server determines at least two target video frames corresponding to each live broadcast data stream after acquiring a plurality of live broadcast data streams uploaded by different main broadcast terminals, then performs merging processing on the at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream, determines local feature information of the merged images corresponding to each live broadcast data stream, determines similarity between each live broadcast data stream based on the local feature information corresponding to each live broadcast data stream, and finally determines at least two live broadcast data streams meeting similarity conditions based on the similarity between each live broadcast data stream; in this way, the local feature information of the merged image is extracted after the at least two target video frames are obtained, so that even if the arrangement sequence of the target video frames in the merged image is different, the same local feature information can be obtained, accurate similarity calculation can be ensured for videos with non-aligned time, and the local feature information of different merged images is calculated without considering the similarity of semantic features during the similarity calculation, so that the calculation complexity can be reduced, and the recognition efficiency of similar live broadcast data streams can be improved; in addition, after the server acquires the live broadcast data acquisition request sent by the audience terminal, the server can perform duplicate removal processing on at least two live broadcast data streams meeting the similarity condition, so that the live broadcast data sent to the audience terminal is ensured not to enter a live broadcast room for playing the same live broadcast content, and the viewing experience of the audience terminal is improved.
In some embodiments, the step S509 "determining the target live data stream that needs to be subjected to the duplication elimination processing from the at least two live data streams that meet the similarity condition" may be implemented based on the number of viewers, the number of reports, the historical behavior data of the user, and the like as the duplication elimination filtering criteria, which are described below in several implementation manners.
First, when implemented based on the number of viewers, it can be implemented by the following steps:
step S5091A, determining that the at least two live data streams satisfying the similarity condition correspond to the number of viewers in the live broadcasting room.
In the embodiment of the application, the number of viewers may be real-time viewers or cumulative viewers.
And step S5092A, determining other live broadcast data streams except for the most number of viewers in the at least two live broadcast data streams as target live broadcast data streams.
Since the content played in the at least two live data streams is similar, i.e. homogenous, only one needs to be reserved in the at least two live data streams. When implemented, it may be that only the most watched live data stream is reserved, that is, the other live data streams except for the most watched live data stream from the at least two live data streams are determined as target live data streams.
The second, when realizing based on reporting times and historical behavior data, can be realized by the following steps:
step S5091B, obtaining the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room that satisfy the similarity condition.
The number of reporting may be the cumulative number of reporting from the start of at least two live data streams, or the number of reporting within a preset history period from the current time, for example, the number of reporting within a week, the number of reporting within a month, and so on.
Step S5092B determines whether the difference between the reporting times of the at least two live data streams corresponding to the live rooms is greater than a difference threshold.
When the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is greater than a difference value threshold, step S5093B is entered; and when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is smaller than or equal to a difference value threshold, step S5094B is performed.
And step S5093B, determining the other live broadcast data streams with the least reporting times as the target live broadcast data stream.
In the embodiment of the present application, when the difference between the reporting times of at least two live broadcast data streams corresponding to the live broadcast room is greater than the difference threshold, it is determined that the reporting times can be used as a duplication elimination processing standard for filtering, and at this time, only the live broadcast data stream with the minimum reporting times needs to be reserved, that is, other live broadcast data streams with the minimum reporting times are determined as target live broadcast data streams.
Step S5094B, in response to the live broadcast data acquisition request sent by the viewer terminal, acquires historical behavior data corresponding to the viewer terminal.
The live broadcast data acquisition request carries a user identifier, and when the step is realized, the server acquires historical behavior data corresponding to the audience terminal based on the user identifier. The historical behavior data may include identification of live rooms where live is historically viewed, viewing duration, interaction information, identification of live rooms of interest, and the like.
When the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast rooms is smaller than or equal to a difference value threshold, the reporting times of the at least two live broadcast rooms are not different, and at the moment, the reporting times are not used as the standard of the duplication eliminating processing, and further judgment is carried out based on the historical behavior data corresponding to the audience terminal.
Step S5095B, determining viewing preference information corresponding to the viewer terminal based on the historical behavior data.
The viewing preference information may be used to characterize the type of live broadcast that the viewer terminal prefers to correspond to the user.
Step S5096B, based on the viewing preference information, determines, as a target live data stream, other live data streams except for the live data stream having the highest matching degree with the viewing preference information from among the at least two live data streams satisfying the similarity condition.
When the step is realized, the matching degree of the live broadcasting room corresponding to at least two live broadcasting data streams and the viewing preference information is firstly determined, and then the live broadcasting data with the highest matching degree is reserved, namely, other live broadcasting data streams with the highest matching degree are determined as target live broadcasting data streams.
It should be noted that, in some embodiments, factors other than the number of viewers, the number of reports, and the viewing preference information may be used as criteria for the duplication elimination processing, for example, one or more of a score of a main cast, a heat of a live room, and a praise may be used as criteria for the duplication elimination processing.
In the following, an exemplary application of the embodiments of the present application in a practical application scenario will be described.
The live broadcast data processing method provided by the embodiment of the application can be used for filtering, reducing the weight of the flow, punishing the bottom and other scenes of the homogenized live broadcast content, as shown in fig. 6, under the live broadcast news category: in a certain time period, the live broadcast contents played by the live broadcast room 601 and the live broadcast room 602 are consistent, and then a certain live broadcast content needs to be filtered and is not displayed. Or not in the same user session, for example, user a can only see live content of the live room R1, and user B can only see live content of the live room R2, thereby avoiding that a single user sees multiple identical live content.
The method for processing live broadcast data provided by the embodiment of the application is described below, and can be implemented through two steps during actual implementation, namely hash calculation and homogeneous content filtration of live broadcast streaming video.
In the live broadcast process, each live broadcast room independently samples and stores own video frames. The video frames of each live room need not perform a time alignment operation. In the embodiment of the application, for all live rooms, a frame is sampled every about 5 seconds, and 2 minutes of video segments are accumulated as one prediction interval, and each prediction interval contains 24 frames. Since there is a time misalignment, the hash algorithm should satisfy the following conditions: (1) A certain degree of frame misalignment can be tolerated, for example, the sequence of frame numbers 4,5,6,7,8,9 should be similar to the hash of the sequence of 5,6,7,8,9, 10, without the problem of excessive distance due to one-to-one comparison of pairs of misaligned frames 4,5, 6. (2) The time cost is small, the extraction of the hash value should not depend on the semantic features of depth, and because only video pairs which represent similar video content but not semantic similarity are found, the extraction of the depth semantic information which is too abstract is not needed.
For these two design goals, in the embodiment of the present application, the hash calculation of the live video is performed through the following two steps:
in step S601, 9 pictures are randomly sampled from the 24 pictures, and then the 9 pictures are combined into a nine Gong Geda picture of 3*3 in time sequence.
Fig. 7A is a nine Gong Geda diagram corresponding to the live broadcasting room A1, fig. 7B is a nine Gong Geda diagram corresponding to the live broadcasting room A2, fig. 8A is a nine Gong Geda diagram corresponding to the live broadcasting room B1, and fig. 8B is a nine Gong Geda diagram corresponding to the live broadcasting room B2.
Step S602, performing feature extraction on the obtained nine Gong Geda graphs by using a feature extraction algorithm based on local features, so as to obtain feature vectors.
The feature extraction algorithm based on the local features can be, but is not limited to, pHash algorithm, SHIFT algorithm and HOG algorithm. Each feature value in the feature vector obtained at this time obtains a floating point representation of the feature. For example, the first ten eigenvalues of a corresponding eigenvector of a certain nine Gong Geda graph are 0.00776728, 0.0436932, -0.0070386, 0.0783124, -0.0413507, 0.0427038, -0.0814955, 0.130502, -0.0673937, 0.0512087.
Step S603, determining a processing threshold based on each feature vector, and setting 1 of the feature vectors larger than the processing threshold and setting 0 of the feature vectors lower than the processing threshold, to obtain a hash feature vector corresponding to the nine Gong Geda graph.
For example, assuming that the processing threshold is 0, the first ten feature hash values of the feature vector are 1, 0, 1.
In the embodiment of the application, after the hash feature vectors of each nine-palace large graph are obtained, each hash feature vector is stored in a database. Because the local sensitive characteristic is extracted when the image characteristic of the nine-palace large image is extracted, the relative sequence of each image in the 9-palace lattice is randomly exchanged, and the finally generated hash value is not influenced, so that the problem of time alignment is solved.
Based on the hash feature representation obtained in the above step S603, homogeneous filtering may be achieved by:
step S701, determining the similarity of each live data stream based on the hash feature representation of each live data stream.
This step is when implemented. After hash feature representations of all live data streams are obtained in real time, the similarity between the live data streams can be calculated, and for hash values f1 and f2, the same bit number is calculated by bit-wise operation, and the same bit number is divided by the total bit number of the hash feature representations, so that the similarity is obtained.
Step S702, if it is determined that the similarity of the at least two live data streams is higher than the preset threshold, it is indicated that the at least two live data streams are homogeneous.
In step S703, a reserved live data stream is determined from the at least two live data streams, and other live data streams are filtered out.
Table 1 is a performance comparison table of the live broadcast data processing method provided by the embodiment of the present application and the scheme one and the scheme two in the related technology, and it can be seen from table 1 that, compared with the scheme one and the scheme two, the live broadcast data processing method provided by the embodiment of the present application has a larger breakthrough in terms of resource utilization and delay, and meanwhile, ensures that the accuracy is not reduced.
Figure BDA0003380586570000231
In the live broadcast data processing method provided by the embodiment of the application, the live broadcast content with the same quality can be filtered in a live broadcast stream scene by landing the audio and video fast hash algorithm, so that the homogeneity of the live broadcast content is effectively reduced, and a user can watch more live broadcast contents with different contents. Meanwhile, the method can be used as the bottom layer capability of illegal account numbers such as hit and transport. Due to the lower delay, the delay caused by the rebroadcasting of the same content in the live broadcast room can be accurately identified. At the same time, the algorithm does not need to occupy the GPU machine, so the corresponding deployment and maintenance cost is lower.
Continuing with the description below of an exemplary architecture implemented as a software module for the live data processing device 443 provided by embodiments of the present application, in some embodiments, as shown in fig. 2, the software modules stored in the live data processing device 443 of the memory 440 may include:
The first determining module is used for acquiring a plurality of live broadcast data streams uploaded by different terminals and determining at least two target video frames corresponding to each live broadcast data stream;
the image merging module is used for merging at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream;
the second determining module is used for determining local characteristic information of the combined image corresponding to each live broadcast data stream and determining the similarity between the live broadcast data streams based on the local characteristic information corresponding to each live broadcast data stream;
and the third determining module is used for determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams.
In some embodiments, the first determining module 4431 is further to:
decoding each live broadcast data stream to obtain a plurality of live broadcast video frames corresponding to each live broadcast data stream;
performing frame extraction processing from a plurality of live video frames corresponding to each live data stream according to a preset interval duration to obtain a plurality of extracted live video frames corresponding to each live data stream;
Randomly acquiring a preset number of live video frames from a plurality of extracted live video frames corresponding to each live data stream to obtain at least two target video frames corresponding to each live data stream.
In some embodiments, the image merging module 4432 is further configured to:
acquiring a preset image target size, and adjusting the sizes of at least two target video frames corresponding to each live broadcast data stream based on the image target size to obtain at least two adjusted target video frames corresponding to each live broadcast data stream;
and combining the at least two adjusted target video frames corresponding to each live broadcast data stream according to the time sequence of the at least two adjusted target video frames corresponding to each live broadcast data stream to obtain combined images corresponding to each live broadcast data stream.
In some embodiments, the second determining module 4433 is further configured to:
carrying out feature extraction on each combined image according to a preset local feature extraction algorithm to obtain feature vectors of each combined image;
determining a feature threshold of each combined image based on the feature vector of each combined image;
And carrying out binarization processing on the corresponding feature vectors based on the feature threshold values of the combined images to obtain local feature information of the combined images.
In some embodiments, the second determining module 4433 is further configured to:
determining the same bit number in the local feature information of different live broadcast data streams based on the local feature information corresponding to each live broadcast data stream;
and determining the similarity between different live broadcast data streams based on the total bit number of the local characteristic information and the same bit number.
In some embodiments, the third determining module 4434 is further configured to:
acquiring a preset similarity threshold;
determining at least two live data streams with similarity greater than or equal to the similarity threshold as at least two live data streams meeting a similarity condition;
the apparatus further comprises:
a fourth determining module, configured to determine a target live data stream that needs to be subjected to duplication elimination processing from the at least two live data streams that meet a similarity condition;
the duplication elimination processing module is used for carrying out duplication elimination processing on the target live broadcast data stream to obtain a processed live broadcast data stream;
and the sending module is used for determining live broadcast data to be sent based on the processed live broadcast data stream and sending the live broadcast data to be sent to the audience terminal.
In some embodiments, the fourth determination module is further configured to:
determining the number of viewers in the live broadcasting room corresponding to the at least two live broadcasting data streams meeting the similarity condition;
and determining other live broadcast data streams except for the most number of viewers in the at least two live broadcast data streams as target live broadcast data streams.
In some embodiments, the fourth determination module is further configured to:
acquiring reporting times of at least two live broadcast data streams meeting the similarity condition corresponding to a live broadcast room;
and when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is larger than a difference value threshold, determining the other live broadcast data streams with the least reporting times as target live broadcast data streams.
In some embodiments, the fourth determination module is further configured to:
when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is smaller than or equal to a difference value threshold value, responding to a live broadcast request sent by the audience terminal, and acquiring historical behavior data corresponding to the audience terminal;
determining viewing preference information corresponding to the audience terminal based on the historical behavior data;
and determining other live broadcast data streams with highest matching degree with the viewing preference information among the at least two live broadcast data streams meeting the similarity condition as target live broadcast data streams based on the viewing preference information.
In some embodiments, the de-duplication processing module is further configured to:
deleting the target live data stream from the plurality of live data streams to obtain a processed live data stream; or alternatively;
and carrying out flow weight reduction processing on the target live broadcast data stream to obtain a processed live broadcast data stream.
It should be noted that, the description of the embodiment of the present application for a live broadcast data processing apparatus is similar to the description of the embodiment of the method described above, and has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the present apparatus, please refer to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the live data processing method according to the embodiment of the application.
The present embodiments provide a computer readable storage medium storing executable instructions, which when executed by a processor, cause the processor to perform a live data processing method provided by the embodiments of the present application, for example, the live data processing method as shown in fig. 3, 4, and 5.
In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and scope of the present application are intended to be included within the scope of the present application.

Claims (14)

1. A live data processing method, the method comprising:
acquiring a plurality of live broadcast data streams uploaded by different terminals, and determining at least two target video frames corresponding to each live broadcast data stream;
combining at least two target video frames corresponding to each live broadcast data stream to obtain combined images corresponding to each live broadcast data stream;
determining local feature information of the combined image corresponding to each live data stream, and determining the similarity between the live data streams based on the local feature information corresponding to each live data stream;
and determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams.
2. The method of claim 1, wherein determining at least two target video frames for each live data stream comprises:
decoding each live broadcast data stream to obtain a plurality of live broadcast video frames corresponding to each live broadcast data stream;
performing frame extraction processing from a plurality of live video frames corresponding to each live data stream according to a preset interval duration to obtain a plurality of extracted live video frames corresponding to each live data stream;
randomly acquiring a preset number of live video frames from a plurality of extracted live video frames corresponding to each live data stream to obtain at least two target video frames corresponding to each live data stream.
3. The method according to claim 1, wherein the merging the at least two target video frames corresponding to the respective live data streams to obtain a merged image corresponding to the respective live data streams includes:
acquiring a preset image target size, and adjusting the sizes of at least two target video frames corresponding to each live broadcast data stream based on the image target size to obtain at least two adjusted target video frames corresponding to each live broadcast data stream;
And combining the at least two adjusted target video frames corresponding to each live broadcast data stream according to the time sequence of the at least two adjusted target video frames corresponding to each live broadcast data stream to obtain combined images corresponding to each live broadcast data stream.
4. The method according to claim 1, wherein determining local feature information of the combined image corresponding to each live data stream comprises:
carrying out feature extraction on each combined image according to a preset local feature extraction algorithm to obtain feature vectors of each combined image;
determining a feature threshold of each combined image based on the feature vector of each combined image;
and carrying out binarization processing on the corresponding feature vectors based on the feature threshold values of the combined images to obtain local feature information of the combined images.
5. The method of claim 4, wherein determining the similarity between the respective live data streams based on the local feature information corresponding to the respective live data streams comprises:
determining the same bit number in the local feature information of different live broadcast data streams based on the local feature information corresponding to each live broadcast data stream;
And determining the similarity between different live broadcast data streams based on the total bit number of the local characteristic information and the same bit number.
6. The method according to any one of claims 1 to 5, wherein said determining at least two live data streams satisfying a similarity condition based on a similarity between the respective live data streams comprises:
acquiring a preset similarity threshold;
determining at least two live data streams with similarity greater than or equal to the similarity threshold as at least two live data streams meeting a similarity condition;
the method further comprises the steps of:
determining a target live data stream needing to be subjected to duplication elimination processing from the at least two live data streams meeting the similarity condition;
performing duplication elimination processing on the target live broadcast data stream to obtain a processed live broadcast data stream;
and determining live broadcast data to be sent based on the processed live broadcast data stream, and sending the live broadcast data to be sent to a spectator terminal.
7. The method according to claim 6, wherein determining a target live data stream that needs to be subjected to a deduplication process from the at least two live data streams that satisfy a similarity condition comprises:
Determining the number of viewers in the live broadcasting room corresponding to the at least two live broadcasting data streams meeting the similarity condition;
and determining other live broadcast data streams except for the most number of viewers in the at least two live broadcast data streams as target live broadcast data streams.
8. The method according to claim 6, wherein determining a target live data stream that needs to be subjected to a duplication elimination process from the at least two live data streams that satisfy the similarity condition includes:
acquiring reporting times of at least two live broadcast data streams meeting the similarity condition corresponding to a live broadcast room;
and when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is larger than a difference value threshold, determining the other live broadcast data streams with the least reporting times as target live broadcast data streams.
9. The method according to claim 8, wherein the determining a target live data stream that needs to be subjected to a deduplication process from the at least two live data streams that satisfy a similarity condition includes:
when the difference value between the reporting times of the at least two live broadcast data streams corresponding to the live broadcast room is smaller than or equal to a difference value threshold value, responding to a live broadcast request sent by the audience terminal, and acquiring historical behavior data corresponding to the audience terminal;
Determining viewing preference information corresponding to the audience terminal based on the historical behavior data;
and determining other live broadcast data streams with highest matching degree with the viewing preference information among the at least two live broadcast data streams meeting the similarity condition as target live broadcast data streams based on the viewing preference information.
10. The method of claim 8, wherein performing de-duplication processing on the target live data stream to obtain a processed live data stream comprises:
deleting the target live data stream from the plurality of live data streams to obtain a processed live data stream; or alternatively;
and carrying out flow weight reduction processing on the target live broadcast data stream to obtain a processed live broadcast data stream.
11. A live data processing apparatus, the apparatus comprising:
the first determining module is used for acquiring a plurality of live broadcast data streams uploaded by different terminals and determining at least two target video frames corresponding to each live broadcast data stream;
the image merging module is used for merging at least two target video frames corresponding to each live broadcast data stream to obtain merged images corresponding to each live broadcast data stream;
The second determining module is used for determining local characteristic information of the combined image corresponding to each live broadcast data stream and determining the similarity between the live broadcast data streams based on the local characteristic information corresponding to each live broadcast data stream;
and the third determining module is used for determining at least two live data streams meeting the similarity condition based on the similarity between the live data streams.
12. A computer device, the computer device comprising:
a memory for storing executable instructions;
a processor for implementing the method of any one of claims 1 to 10 when executing executable instructions stored in said memory.
13. A computer readable storage medium storing executable instructions which when executed by a processor implement the method of any one of claims 1 to 10.
14. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the method of any one of claims 1 to 10.
CN202111447765.6A 2021-11-29 2021-11-29 Live broadcast data processing method, device, equipment and computer readable storage medium Pending CN116193149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111447765.6A CN116193149A (en) 2021-11-29 2021-11-29 Live broadcast data processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111447765.6A CN116193149A (en) 2021-11-29 2021-11-29 Live broadcast data processing method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116193149A true CN116193149A (en) 2023-05-30

Family

ID=86431257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111447765.6A Pending CN116193149A (en) 2021-11-29 2021-11-29 Live broadcast data processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116193149A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117255212A (en) * 2023-11-20 2023-12-19 北京泰伯科技有限公司 Remote emergency live broadcast control method and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117255212A (en) * 2023-11-20 2023-12-19 北京泰伯科技有限公司 Remote emergency live broadcast control method and related equipment
CN117255212B (en) * 2023-11-20 2024-01-26 北京泰伯科技有限公司 Remote emergency live broadcast control method and related equipment

Similar Documents

Publication Publication Date Title
US9110988B1 (en) Methods, systems, and media for aggregating and presenting multiple videos of an event
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
US11025964B2 (en) Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection
CN110633669B (en) Mobile terminal face attribute identification method based on deep learning in home environment
CN110149529B (en) Media information processing method, server and storage medium
CN110996131B (en) Video encoding method, video encoding device, computer equipment and storage medium
CN104185040A (en) Application synchronization method, application server and terminal
Jayanthiladevi et al. AI in video analysis, production and streaming delivery
US20220417540A1 (en) Encoding Device and Method for Utility-Driven Video Compression
CN113704506A (en) Media content duplication eliminating method and related device
US10740618B1 (en) Tracking objects in live 360 video
WO2024051480A1 (en) Image processing method and apparatus, computer device, and storage medium
CN116665083A (en) Video classification method and device, electronic equipment and storage medium
US9300853B2 (en) Network camera data management system and managing method thereof
KR20200077176A (en) Apparatus and method for recommending keyword related to video
CN116193149A (en) Live broadcast data processing method, device, equipment and computer readable storage medium
CN113762040B (en) Video identification method, device, storage medium and computer equipment
CN117119255B (en) Monitoring method, system, equipment and storage medium for illegal video playing
CN106937127B (en) Display method and system for intelligent search preparation
CN117177012B (en) Video broadcasting monitoring method, system, equipment and storage medium
CN106412492B (en) Video data processing method and device
CN115983499A (en) Box office prediction method and device, electronic equipment and storage medium
CN117009577A (en) Video data processing method, device, equipment and readable storage medium
CN111797273B (en) Method and device for adjusting parameters
CN113938707A (en) Video processing method, recording and playing box and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40086442

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination