WO2022247849A1 - Multimedia data processing method and apparatus, and device and storage medium - Google Patents

Multimedia data processing method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2022247849A1
WO2022247849A1 PCT/CN2022/094878 CN2022094878W WO2022247849A1 WO 2022247849 A1 WO2022247849 A1 WO 2022247849A1 CN 2022094878 W CN2022094878 W CN 2022094878W WO 2022247849 A1 WO2022247849 A1 WO 2022247849A1
Authority
WO
WIPO (PCT)
Prior art keywords
multimedia data
deleted
video
processed
video frame
Prior art date
Application number
PCT/CN2022/094878
Other languages
French (fr)
Chinese (zh)
Inventor
郐洪楠
刘伟科
韩卫召
沈俊杰
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022247849A1 publication Critical patent/WO2022247849A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed

Definitions

  • the present application relates to the field of computer technology, and relates to but not limited to a multimedia data processing method, device, equipment, and storage medium.
  • the historical live video processing solution with content security issues is manually operated.
  • the live broadcast information registered by the merchant if a live video resource with a product that needs to be removed from the shelf is found, the entire live video resource needs to be deleted.
  • this application provides a multimedia data processing method, device, equipment, and storage medium, which can meet the actual needs of users and automatically delete multimedia data segments that need to be deleted in multimedia data.
  • an embodiment of the present application provides a multimedia data processing method, the method comprising:
  • the label of the multimedia data to be processed includes the label to be deleted
  • Identifying the content of the multimedia data to be processed, and determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted;
  • an embodiment of the present application provides a multimedia data processing device, including:
  • the determining unit is configured to determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
  • the identification unit is configured to identify the content of the multimedia data to be processed, and determine the segment to be deleted in the multimedia data to be processed, and the content of the segment to be deleted includes the object corresponding to the label to be deleted;
  • the clipping unit is configured to clip the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
  • the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the above-mentioned multimedia data processing is realized when the processor runs the computer program steps in the method.
  • the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps in the above multimedia data processing method are implemented.
  • a multimedia data processing method, device, device, and storage medium including: determining the multimedia data to be processed based on the tag to be deleted; the tag of the multimedia data to be processed includes the tag to be deleted; Identifying the content of the multimedia data to be processed, determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted; the segment to be deleted, from Cutting out the multimedia data to be processed to obtain the target multimedia data; thereby deleting the part of the multimedia data whose tag includes the tag to be deleted and whose content includes the object corresponding to the tag data, and only retaining the segments that do not involve the tag to be deleted. Therefore, when removing or removing a product from the online store, the products involved, including the video of the product, should be deleted as a whole, so as to avoid the impact of the removed product on the normal display of other products.
  • FIG. 1 is a schematic diagram of an optional architecture of a multimedia data processing system provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an optional architecture of a multimedia data processing system provided in an embodiment of the present application
  • FIG. 3 is an optional schematic flowchart of a multimedia data processing method provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an optional effect of multimedia data clipping provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an optional multimedia data processing system provided by an embodiment of the present application.
  • FIG. 6 is an optional schematic flowchart of a method for processing multimedia data at a merchant end provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of optional reception of live file information provided by the embodiment of the present application.
  • FIG. 8 is an optional flow diagram of the merchant-end live broadcast process provided by the embodiment of the present application.
  • FIG. 9 is an optional schematic flowchart of a method for processing multimedia data at a merchant end provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of optional segmentation of multimedia data provided by the embodiment of the present application.
  • FIG. 11 is a schematic diagram of an optional clipping effect of multimedia data provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an optional multimedia data processing device provided in an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an optional electronic device provided by an embodiment of the present application.
  • Embodiments of the present application may provide a multimedia data processing method and system, and a storage medium.
  • the multimedia data processing method can be realized by a multimedia data processing system, and each functional entity in the multimedia data processing system can be composed of hardware resources of an electronic device (such as a terminal device or a server), computing resources such as a processor, communication resources (such as It is used to support the realization of communication in various ways such as optical cable and cellular) and collaborative realization.
  • the multimedia data processing method of the embodiment of the present application can be applied to the multimedia data processing system shown in FIG. 1, including: a client 10 and a server 20, wherein the client interacts with the user based on the input device, and receives the user input to be deleted.
  • the client 10 and the server 20 are respectively located on different physical entities, and at this time, the server 20 can communicate with the client 10 through the network 30 .
  • the multimedia data processing system also includes: a multimedia data acquisition terminal 40, the multimedia data acquisition terminal 40 can collect multimedia data based on the data acquisition device, and send the collected multimedia data to the server 20 .
  • Data collection equipment includes: cameras, microphones and other equipment capable of data collection.
  • the client terminal 10 receives the label to be deleted based on the number of users input by the input device, and sends the label to be deleted to the server 20, and the server determines the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the Tag to be deleted; identify the content of the multimedia data to be processed, determine the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the tag to be deleted; The segment is deleted and cut out from the multimedia data to be processed to obtain the target multimedia data.
  • the client 10 can be an operation terminal that operates and manages videos stored in the server
  • the multimedia data collection terminal can be a live broadcast terminal running a live broadcast application program for users to perform live broadcast
  • the server is a live broadcast application The server on which the program provides the service.
  • this embodiment proposes a multimedia data processing method that can meet the actual needs of users and automatically delete multimedia data segments that need to be deleted in the multimedia data.
  • An embodiment of the present application provides a multimedia data processing method.
  • the functions realized by the method can be realized by calling the program codes by the processor in the electronic device, and of course the program codes can be stored in the computer storage medium.
  • the electronic device at least includes a processor and a storage medium.
  • Fig. 3 is a schematic diagram of the implementation flow of a multimedia data processing method in the embodiment of the present application. As shown in Fig. 3, the method may include the following steps:
  • the server determines multimedia data to be processed based on the tag to be deleted; the tag of the multimedia data to be processed includes the tag to be deleted.
  • the user When the user needs to delete the multimedia data related to a certain object, he can input the label to be deleted to the client, and perform a masking operation that triggers the video deletion function on the client.
  • a masking command is generated.
  • the masking instruction carries a tag to be deleted.
  • the server After receiving the shielding instruction, the server parses the shielding instruction to obtain the tag to be deleted.
  • the server determines the tag to be deleted, it determines the multimedia data to be processed.
  • the multimedia data to be processed may be the multimedia data designated by the user through the client, or may be selected from at least one piece of multimedia data.
  • the file type of the multimedia data may be video, audio and other data that lasts for a period of time.
  • the server judges whether the label of the specified multimedia data includes the label to be deleted, and if the label of the specified multimedia data includes the label to be deleted, then determine the specified multimedia data The data is the label to be deleted.
  • the designated multimedia data is multimedia data A
  • the tags of multimedia data A include: tag 1, tag 2, tag 3, and tag 4.
  • tag 1 When the tag to be deleted is tag 1, then multimedia data A is multimedia data to be processed.
  • data when the label to be deleted is label 5, the multimedia data A is not the multimedia data to be processed.
  • the multimedia data whose tag includes the tag to be deleted in the at least two multimedia data is determined as the multimedia data to be processed.
  • At least two multimedia data include: multimedia data A, multimedia data B and multimedia data C
  • the label to be deleted is label 2
  • the label of multimedia data A includes label 2
  • multimedia data A is multimedia data to be processed
  • the label of multimedia data B does not include label 2
  • multimedia data B is not multimedia data to be processed
  • the label of multimedia data C includes label 2
  • multimedia data to be processed includes multimedia data A and multimedia data C.
  • a label list may be established in the server, and the label list includes the identification of each multimedia data and the association relationship between the labels. Based on the label list, the server can determine that the label includes the unprocessed multimedia data of the label to be deleted. Wherein, the tags in the tag list may be input by the user.
  • the multimedia data to be processed is the live video of the live broadcast.
  • the user Before the live broadcast, the user can receive the live file information input by the user at the live broadcast terminal.
  • the current live information can include the label of this live broadcast.
  • the information is sent to the server, and after the live video is generated based on the user's live broadcast, the server establishes an association between the live file information and the video ID of the generated live video.
  • the tags in the multimedia data to be processed represent the commodities involved in the content of the multimedia data to be processed.
  • the multimedia data to be processed can be associated with multiple commodity links, and the content pointed to by each commodity link is commodity purchase.
  • the page includes product information on the product purchase page, that is, the product information of each product can be obtained based on the product link.
  • the product information obtained through the product link is used as the label of the multimedia data to be processed.
  • the product links associated with the multimedia data to be processed include: product links of product 1, product links of product 2 and product links of product 3, and the tags of the multimedia data to be processed include: product 1, product 2 and product 3 .
  • the multimedia data to be processed when the multimedia data to be processed is a video, the multimedia data to be processed can be a video downloaded from the network side, such as: a video of a TV series, or a live video formed by a video stream uploaded by a user, such as: User A's live video.
  • the server identifies the content of the multimedia data to be processed, and determines a segment to be deleted in the multimedia data to be processed, where the content of the segment to be deleted includes an object corresponding to the tag to be deleted.
  • the server After the server determines the multimedia data to be processed, it retrieves the multimedia data to be processed, and identifies the content of the multimedia data to be processed as the object corresponding to the label to be deleted, that is, the object to be deleted.
  • the tag to be deleted may represent any object that can appear in the multimedia data, such as a person, a commodity, or a building.
  • the tag to be deleted may be the name of the product, the stock keeping unit (Stock Keeping Unit, SKU) and other product information.
  • the file types of the multimedia data are different, and the representation forms of the objects to be deleted in the multimedia data are different.
  • the object to be deleted is the image content in the video
  • the object to be deleted is the audio content in the audio.
  • the segment to be deleted is a segment whose content includes the object to be deleted in the multimedia data.
  • the target character A to be deleted the multimedia data to be processed is video A
  • the duration of video A is T1
  • the video content includes character A
  • the segment to be deleted is the video of time period [t1, t2], where t1 is greater than or equal to 0, and t2 is less than or equal to T1.
  • the target character A to be deleted, the multimedia data to be processed is audio B
  • the duration of audio B is T2
  • the audio content includes character A
  • the segment to be deleted is an audio segment of time period [t3, t4], wherein t3 is greater than or equal to 0, and t4 is less than or equal to T2.
  • one or more segments to be deleted may be included in one piece of multimedia data to be processed.
  • the multimedia data to be processed may be stored in the server, or may be stored in a storage terminal corresponding to the server. At this time, the server retrieves the multimedia data to be processed from the storage terminal.
  • the server cuts the segment to be deleted from the multimedia data to be processed to which the segment to be deleted belongs, to obtain target multimedia data.
  • the server determines the segment to be deleted, it cuts the segment to be deleted from the video to be processed.
  • the multimedia data to be processed is video A
  • the duration of video A is T1
  • the segment to be deleted is the video of the time period [t1, t2]; when t1 is 0, then t2 is less than T1, at this time, the time Segment [t1, t2] is cut from video A to obtain target video A.
  • the video content of the target video is the video content of [t2, T1] of video A; when t1 is greater than 0, then t2 It is equal to T1.
  • the video of the time period [t1, t2] is cut from video A to obtain the target video A.
  • the video content of the target video is the video content of video A during the period [0, t1].
  • t1 is greater than 0, then t2 is less than T1.
  • the video of the time period [t1, t2] is cut from video A to obtain the target video A.
  • the video content of the target video is [0, t1 of video A ] and [t2, T1] the video content during these two periods.
  • a live video is a live video of user A selling commodities, and this live video is shown in (a) in 4, and the commodity sold in the video segment 401 within the time period shown by T0 to T1 is commodity A.
  • the commodity sold in the video segment 402 in the time period shown in T1 to T2 is product B
  • the product sold in the video segment 403 in the time period shown in T2 to T3 is product C
  • the commodity sold by the video segment 404 in the time period is commodity D.
  • the video segment 402 is deleted from this video as the segment to be deleted, and the obtained target video is as shown in Figure 4 (b) shown.
  • Audio 1 is the interview audio of user A, user B, user C, and user D.
  • user B does not meet the interview conditions and needs to delete user B’s interview content from the audio
  • the user B’s interview corresponding Snippets are removed from this audio, and only the interviews with User A, User C, and User D remain.
  • a multimedia data processing method determines the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted; the content of the multimedia data to be processed Identifying and determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted; cutting the segment to be deleted from the multimedia data to be processed , to obtain the target multimedia data; thus, in the multimedia data whose tag includes the tag to be deleted, the content includes some fragments of the object corresponding to the tag data to be deleted, and only keep the fragments that do not involve the tag to be deleted, so that there is no need to put a certain product on the shelf , delete the product involved, including the video of the product as a whole, to avoid the impact of the removed product on the normal display of other products.
  • the implementation of S301 determining the multimedia data to be processed based on the tag to be deleted includes:
  • the masking instruction sent by the client may be a one-key masking instruction, and at this time, the server retrieves all multimedia data to be processed from at least two original multimedia data.
  • the server When retrieving multimedia data to be processed among at least two original multimedia data, the server acquires tags of each original multimedia data, wherein one original multimedia data includes one or more tags.
  • the tag of the original multimedia data may be input by the user, or may be obtained by the server from the product link of the original multimedia data.
  • the label of the original multimedia data is input by the user
  • the label of the multimedia data is input in the multimedia data collection terminal, so that the label of the multimedia data and the original multimedia data sent together to the server.
  • the data collection terminal is the user's live broadcast terminal.
  • the user inputs the live broadcast file information of this live broadcast, wherein the live broadcast file information may include: live broadcast title, live broadcast time, home page picture, shopping cart product list, etc.
  • the server After the server obtains the tags of each original multimedia data, for each original multimedia data, perform the following processing: match the tags of the original multimedia data with the tags to be deleted, and determine the original multimedia data whose tags include the tags to be deleted as multimedia files to be processed data.
  • At least two original multimedia data include: multimedia data A, multimedia data B and multimedia data C
  • the label to be deleted is label 2
  • the label of multimedia data A includes labels: label 1, label 2 and label 3
  • multimedia The tags of data B include tag 4 and tag 5
  • the tags of multimedia data C include tag 2 , tag 5 and tag 6
  • the multimedia data to be processed includes multimedia data A and multimedia data C.
  • S301 determines the multimedia data to be processed based on the tag to be deleted, the following steps are also implemented:
  • the commodity information targeted by each commodity link in the at least one commodity link is determined as a tag of the original video data.
  • the label of the original multimedia data can be determined from the product link under the account to which the original multimedia data belongs.
  • setting conditions may include at least one of the following:
  • the generation time of the original multimedia data is less than the set time from the current time
  • the original multimedia data is the latest multimedia data under the account to which the original multimedia data belongs.
  • the set time may be 24 hours.
  • the server determines the account to which the multimedia data to be processed belongs.
  • the account may be the account used when uploading the multimedia data to be processed.
  • the server determines the product link under the account, and uses the product information of the product targeted by the product link as the pending account. Tags for multimedia data.
  • the file type of the multimedia data to be processed includes: video, S302 identifies the content of the multimedia data to be processed, and the implementation of determining the segment to be deleted in the multimedia data to be processed includes:
  • the reference image may be the main product image of the object corresponding to the tag to be deleted.
  • the product main image is the main image displayed to the user on the product detail page, which can directly display the product.
  • the server After the server acquires the reference image, it matches the video frame in the video frame sequence with the reference image, wherein the matching result includes: a first video frame and a second video frame, and the first video frame is the content corresponding to the tag to be deleted The video frame of the object, and the second video frame is a video frame of the object whose content does not include the label to be deleted.
  • the image similarity between the video frame and the reference image can be calculated, and the video frame whose image similarity with the reference image is greater than the similarity threshold is determined as the first video frame, and the image similarity with the reference image is determined to be less than or The video frame equal to the similarity threshold is the second video frame.
  • the image similarity can be represented by a Hamming distance between images, and in the embodiment of the present application, no limitation is imposed on the representation of the image similarity.
  • a plurality of consecutive first video frames constitute the segment to be deleted, wherein the first video frame in the segment to be deleted is the clipping start video frame, and the last video frame in the segment to be deleted is the clipping end video frame .
  • the video frame preceding the clipping start video frame is the second video frame
  • the video frame following the clipping end video frame is the second video frame.
  • the server can determine the image similarity between each video frame and the reference image. When the image similarity between a video frame and the reference image is greater than the similarity threshold, the video frame is the first video frame. When a video frame and the reference image If the image similarity of the image is less than or equal to the similarity threshold, the video frame is the second video frame.
  • the server can also estimate the live broadcast time of a product, and periodically search for the video frequency band of the product in the video, thereby reducing the workload of image recognition. For example: if the video frame of T1 is detected as the first video frame, then detect whether the video frame of T1+t1 is the first video frame, if the video frame of T1+t1 is not the first video frame, return to detect T1+t1- Whether the video frame of t2 (t2 is less than t1) is the first video frame; If the video frame of T1+t1 is the first video frame, then detect whether the next video frame of the video frame of T1+t1 is the first video frame, if The video frame after the video frame of T1+t1 is not the first video frame, then the video frame of T1+t1 is determined to be the last video frame of the segment to be deleted, if the video frame after the video frame of T1+t1 is the first For the video frame, continue to detect whether the video frame of T1+2*t1 is the first
  • S3023 matches the reference image with video frames in the sequence of video frames
  • the implementation of obtaining the matching result includes: for each video frame in the sequence of video frames, determining the The reference area of the content object included in the video frame, and cut out the reference area from the video frame to obtain the image to be matched; determine the content similarity between the image to be matched and the reference image;
  • the video frame to which the matching image whose similarity is greater than the set similarity threshold is determined as the first video frame; the content of the first video frame includes the object corresponding to the label to be deleted; the similarity is less than or equal to the similarity
  • the video frame to which the image matching the threshold belongs is determined as the second video frame, and the content of the second video frame does not include the object corresponding to the tag to be deleted.
  • a target detection model is set in the server, and the server performs the following processing on the video frame that needs to be judged to be the first video frame:
  • the video frame is used as the input of the target detection model to obtain the position of the content object included in the video frame output by the target detection model, and based on the output position of the target detection model, the area of the position of the content object is cut out from the video frame, that is, the reference area , to obtain the image to be matched of the video frame; the server calculates the image similarity between the image to be matched and the reference image to obtain the image similarity between the video frame and the reference image.
  • the algorithm adopted by the target detection model in the server may include Faster region-convolution neural network (Faster R-CNN), word multi-box detector (Single Shot MultiBox Detector, SSD ) and other target detection algorithms, in the embodiment of the present application, the target detection algorithm adopted by the target detection model is not limited in any way.
  • the server can also set an image segmentation model, and determine the reference area from the video frame based on the image segmentation model.
  • the image segmentation algorithm adopted by the image segmentation model may include image segmentation algorithms such as region growing, mean value iterative segmentation, and maximum entropy segmentation.
  • the image segmentation algorithm adopted by the image segmentation model is not limited in any way.
  • the image to be matched or the reference image when calculating the image similarity between the image to be matched and the reference image, can be used as the target image to perform the following processing: the target image is reduced to a set size, and the reduced target image Perform grayscale processing and calculate the hash value of the target image after grayscale processing. At this time, calculate the similarity between the hash value of the image to be matched and the hash value of the reference image to obtain the hash value of the image to be matched value and the image similarity of the reference image.
  • the set size is 9*8.
  • the implementation of S3024 determining the clipping start point and the clipping end point corresponding to the clipping start point according to the matching result includes:
  • the adjacent previous frame belongs to the second video frame but the video frame that itself belongs to the first video frame is determined as the cropping start frame; the content of the first video frame includes the corresponding Object; the content of the second video frame does not include the object corresponding to the label to be deleted;
  • S303 cuts the segment to be deleted from the multimedia data to be processed to which the deleted segment belongs, and the implementation of obtaining the target multimedia data includes:
  • the segment before the segment to be deleted and the segment after the segment to be deleted are merged to obtain the Describe the target multimedia data.
  • the second video frame in the sequence of video frames is spliced together based on continuity, that is, the continuous second video is spliced into a segment to be retained, wherein the last segment of the segment to be retained before the segment to be deleted
  • a video frame is the previous frame video of the cutting start frame of the segment to be deleted
  • the first video frame of the segment to be retained after the segment to be deleted is the next frame video of the trimming end frame of the segment to be deleted.
  • the previous video frame of the cutting start frame of the segment to be deleted is spliced with the next video frame of the trimming end frame, and the segment to be reserved before the segment to be deleted is merged with the segment to be retained after.
  • the video frame before the clipping start frame and the next video frame after the clipping end frame of all the clips to be deleted in the multimedia data to be processed are spliced to obtain the target video.
  • the segment to be deleted may be a start position or an end position of the multimedia data to be processed, or may be located in a middle position of the multimedia data to be processed.
  • splicing may only be performed on the video frame preceding the cropping start frame and the subsequent video frame of the cropping end frame of the segment to be deleted located in the middle of the multimedia data to be processed.
  • the server replaces the stored multimedia data to be processed with the target multimedia data obtained after processing the multimedia data to be processed.
  • the data volume of the target multimedia data can be judged, and when the data volume of the target multimedia data is greater than the set data volume, the target multimedia data can be The multimedia data is divided into multiple data blocks, and the multiple data blocks are uploaded to the storage terminal. At this time, the storage end splices multiple data blocks to obtain the target multimedia data, and replaces the original multimedia data to be processed with the target multimedia data.
  • the multimedia data processing system includes: a merchant terminal 501 , a server terminal 502 and an operator terminal 503 .
  • the merchant terminal 501 is used to establish live file information, generate a video data stream of a live video file, and send the live file information and video data stream to the server 502 .
  • the merchant terminal 501 performs the following processing:
  • the merchant receives the live file information filled in by the merchant, and sends the live file information to the server.
  • the live archive information is stored in the live archive database of the server 502 .
  • the merchant terminal 501 provides a live broadcast management background 701.
  • the merchant can fill in the live file information 702 through the live broadcast management background 701, thereby entering the live file information 702.
  • the live file information 702 may include: Title 7021 , live broadcast time 7022 , home page picture 7023 , shopping cart product list 7024 and other live broadcast detailed information, wherein the shopping cart product list 7024 includes information about products to be sold in the live broadcast.
  • the shopping cart product list 7024 can be used as a basis for retrieving videos to be processed.
  • the merchant end broadcasts live.
  • the video stream of the live broadcast process is sent to the server 502, and the server generates a live video file based on the received live video stream, and stores it in the live video library of the server 502.
  • the merchant terminal collects image data
  • the merchant end can collect image data through the image acquisition device.
  • the merchant terminal performs image processing on the collected image data
  • the image processing may include: beautification, filter and other processing.
  • the merchant end compresses the image data that has undergone image processing.
  • the merchant end encodes and compresses the image data after image processing.
  • the merchant end transmits the compressed image data to the server end in the form of a video stream.
  • the merchant side uploads the compressed image data to the server side through the Real Time Messaging Protocol (RTMP).
  • RTMP Real Time Messaging Protocol
  • an association relationship is established between the live archive information and the video storage information of the live video information in the live archive database (not shown) of the server 502 .
  • the video storage information includes: a video name, a video storage address, and the like.
  • the operation terminal 503 performs the following processing: the operation terminal receives the SKU of the commodity that needs to be blocked.
  • the user enters the SKU of the product in the operating terminal 503 and clicks the one-key shielding function.
  • the operating terminal 503 receives the SKU of the product input by the user and enables the automatic video blocking function .
  • the activation of the automatic video shielding function is triggered, and a shielding instruction is sent to the server 502, and the sent shielding instruction includes the received SKU of the product to be shielded.
  • the operator 503 sends the SKU of the product to be masked to the server 502 .
  • the server 502 After the server 502 receives the masking instruction, as shown in FIG. 9 , it performs the following processing:
  • the server retrieves the video to be processed based on the SKU of the commodity that needs to be blocked.
  • the server 502 searches the live broadcast archives based on the SKU of the commodity that needs to be masked, and retrieves the videos to be processed that need to be masked.
  • the video list to be processed may be generated based on video information of the videos to be processed.
  • the server 502 uses the SKU to be blocked input by the operator 503 to perform an exact match in the shopping cart product list of the live broadcast archive. Those that do not contain SKUs are videos that do not need to be processed, and those that contain SKUs are videos that need to be processed.
  • the server extracts the to-be-processed video containing the SKU to the to-be-processed video library, and uses the video in the to-be-processed video library as the input of video clipping to identify and detect the similarity with the main image of the SKU that needs to be masked, that is, the reference image.
  • the server determines, based on the SKUs of the commodities that need to be shielded, the main image of the SKUs of the commodities that need to be shielded.
  • the server 502 When the operator 503 inputs the SKU to be masked to the server 502, the server 502 will go to the main website to query the main picture of the product according to the SKU, as a model picture for image similarity recognition on the video.
  • the server performs image similarity recognition on each frame of the video to be processed based on the main image of the SKU of the product that needs to be blocked, and identifies the video segment that needs to block the SKU.
  • the identified video segment that needs to be masked in editing the video to be processed by the server is the segment to be deleted.
  • the server sequentially edits the data frames of the identified video segments that need to be masked in each video to be processed in the video list to be processed. For each video to be processed, all the data frames in the video segment to be masked are clipped, and after the clipping is completed, the video is merged to form a complete video, that is, the target video.
  • the server saves the target video.
  • the server uses an image similarity recognition algorithm to identify the similarity between the SKU main image of the product to be masked and each frame of the video to be processed.
  • the image similarity recognition algorithm in the embodiment of the present application can be completed by the target detection model faster-rcnn and the dHash algorithm.
  • the identification of product pictures includes the following steps:
  • the scaled image is input to the feature extraction layer (Conv layers), which includes a convolution (conv) layer, an activation (relu) layer, and a pooling (pooling) layer.
  • Conv layers includes a convolution (conv) layer, an activation (relu) layer, and a pooling (pooling) layer.
  • the feature extraction layer is used to extract feature maps of the input image.
  • S5233A Determine candidate regions in the feature map of the input image through the region candidate network of faster-rcnn.
  • the region candidate network layer uses softmax to judge whether the anchors in the feature map belong to the foreground or the background, and then uses the bounding box regression algorithm (bounding box regression) to modify the anchor points to obtain accurate candidate regions (proposals).
  • bounding box regression bounding box regression
  • the ROI pooling layer uses proposals to extract candidate region features (proposal feature maps) from feature maps.
  • the classification layer may include a fully connected and softmax network, through which the extracted candidate region features are used to classify each candidate region, and the candidate regions classified as SKUs that need to be masked are identified.
  • the final precise position of the detection frame of type SKU is obtained through the bounding box regression algorithm again, that is, the image coordinates of each SKU.
  • the image coordinates obtained by faster-rcnn are cut to the video frame to obtain the image to be matched, and the dHash similarity is compared between each image to be matched and the SKU main image.
  • the similarity comparison includes the following steps:
  • the image is reduced to 9*8 or 72 pixels.
  • the difference value between two adjacent pixels is calculated to obtain 8 difference values, and for 8 rows, 64 difference values or hash values are obtained.
  • the Hamming distance between the image to be matched and the main image of the SKU is calculated by the hash value of the image to be matched and the hash value of the main image of the SKU.
  • the calculated Hamming distance is used as the distance between the image to be matched and the main image of the SKU Similarity, where the smaller the Hamming distance, the more similar the two pictures are, and the larger the Hamming distance is, the less similar the two pictures are.
  • the data frames are recorded according to time intervals, for example [t1, t2], [t3, t4] and so on.
  • Time t1 represents the first frame in which the product to be blocked appears
  • t2 represents the last frame in which the product appears in this continuous time period.
  • [t3,t4] is the start and end time when the commodity appears in the next time interval.
  • the identified segment 1001 to be deleted whose content is the input SKU includes: segment 1, segment 2, segment 3, and segment 4, where t1 and t2 are the first frame and the last frame appear in the video to be processed, t3 and t4 are the time when the first frame and the last frame of segment 2 appear in the video to be processed, respectively, and t5 and t6 are the first frame and the last frame of segment 3, respectively The time when the last frame appears in the video to be processed, t7 and t8 are respectively the time when the first frame and the last frame of segment 4 appear in the video to be processed.
  • the video to be processed is stored on the disk after data compression, and is stored in the form of a binary file.
  • the duration of the video to be processed is t4, and the segment to be deleted with the product to be blocked includes: segment 1 with a time range of [t1, t2], segment 1 with a time range of [t3, t4]
  • Fragment 2 the binary code of fragment 1 is: 10111
  • the code of fragment 2 is: 11011
  • two videos are obtained based on the video frames in the time range [t0, t1), (t2, t3) respectively Files V1 and V2
  • the duration of video file V1 is tv1
  • the encoding of video file V1 is: 110111
  • the duration of video file V2 is tv2
  • the encoding of video file V2 is: 011
  • the files V2 are merged to obtain a new video file V3, and the encoding of the video file V3 is: 110111011.
  • the compressed data can be cut into data packets, which are uploaded sequentially in segments, and after the segmented uploaded data packets are spliced, the complete video file is stored and uploaded to the storage path of the original video resource.
  • the product list of the live broadcast is identified, and the existing data information (SKU information filled in by the merchant) is used to automatically retrieve the videos that need to be blocked in the video library, and the videos that need to be processed Automatic cutting, merging, and uploading do not require manual search, and automation ensures the safety of live content. It is an automated video shielding solution that solves the problem that huge live historical video resources cannot be automatically blocked by one click according to the product.
  • FIG. 12 is a schematic structural diagram of a multimedia data processing device according to an embodiment of the present application. As shown in FIG. 12 , the multimedia data processing device 1200 includes:
  • the determining unit 1201 is configured to determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
  • the identification unit 1202 is configured to identify the content of the multimedia data to be processed, and determine the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted;
  • the clipping unit 1203 is configured to clip the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
  • the determining unit 1201 is further configured to:
  • the device 1200 also includes: a tag acquisition unit configured to:
  • the commodity information targeted by each commodity link in the at least one commodity link is determined as a tag of the original video data.
  • the identification unit 1202 is further configured to:
  • the file type of the multimedia data to be processed includes video, extracting the video frame sequence of the multimedia data to be processed;
  • the matching result determine the trimming start frame and the trimming end frame corresponding to the trimming start frame; the video frames between the trimming start frame and the trimming end frame constitute the segment to be deleted.
  • the identifying unit 1202 is further configured to:
  • Determining the video frame to which the matching image whose content similarity is greater than the set similarity threshold belongs is the first video frame; the content of the first video frame includes the object corresponding to the label to be deleted;
  • the video frame to which the matching image whose similarity is less than or equal to the similarity threshold belongs is determined as a second video frame, and the content of the second video frame does not include the object corresponding to the tag to be deleted.
  • the identifying unit 1202 is further configured to:
  • the adjacent previous frame belongs to the second video frame but the video frame that itself belongs to the first video frame is determined as the cropping start frame; the content of the first video frame includes the corresponding Object; the content of the second video frame does not include the object corresponding to the label to be deleted;
  • the cropping unit 1203 is further configured to:
  • the segment before the segment to be deleted and the segment after the segment to be deleted are merged to obtain the Describe the target multimedia data.
  • the device 1200 further includes: a replacement unit configured to:
  • each logic unit included in the multimedia data processing device can be realized by a processor in an electronic device; of course, it can also be realized by a specific logic circuit; in the process of implementation, the processing
  • the processor can be a central processing unit (CPU, Central Processing Unit), a microprocessor (MPU, Micro Processor Unit), a digital signal processor (DSP, Digital Signal Processor) or a field programmable gate array (FPGA, Field-Programmable Gate Array )Wait.
  • CPU Central Processing Unit
  • MPU Micro Processor Unit
  • DSP Digital Signal Processor
  • FPGA Field-Programmable Gate Array
  • the above multimedia data processing method is implemented in the form of software function modules and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) runs all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk.
  • An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the above-mentioned multimedia data processing method when running the computer program. step.
  • the embodiments of the present application provide a storage medium, that is, a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the multimedia data processing method provided in the foregoing embodiments is implemented.
  • FIG. 13 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application.
  • the electronic device 1300 includes: a processor 1301, at least one communication bus 1302, and at least one external communication interface 1304 and memory 1305.
  • the communication bus 1302 is configured to realize connection and communication between these components.
  • the electronic device 1300 further includes: a user interface 1303, wherein the user interface 1303 may include a display screen, and the external communication interface 1304 may include a standard wired interface and a wireless interface.
  • the memory 1305 is configured to store instructions and applications executable by the processor 1301, and can also cache data to be processed or processed by the processor 1301 and various modules in the electronic device (for example, image data, audio data, voice communication data and video data) Communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Disclosed in the present application is a multimedia data processing method. The method comprises: on the basis of a label to be deleted, determining multimedia data to be processed, wherein a label of said multimedia data comprises the label to be deleted; identifying the content of said multimedia data, and determining a fragment to be deleted from said multimedia data, wherein the content of said fragment comprises an object corresponding to the label to be deleted; and clipping said fragment from said multimedia data, so as to obtain target multimedia data. Further disclosed in the present application are a multimedia data processing apparatus, a device, and a storage medium.

Description

多媒体数据处理方法及装置、设备、存储介质Multimedia data processing method and device, device, storage medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202110569790.5、申请日为2021年5月25日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is based on a Chinese patent application with application number 202110569790.5 and a filing date of May 25, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated into this application by reference.
技术领域technical field
本申请涉及计算机技术领域,涉及但不限于一种多媒体数据处理方法及装置、设备、存储介质。The present application relates to the field of computer technology, and relates to but not limited to a multimedia data processing method, device, equipment, and storage medium.
背景技术Background technique
直播业务高速发展的今天,打造内容安全平台,减少负面影响是直播业务发展的基本条件。Today, with the rapid development of live broadcast business, building a content security platform and reducing negative impact are the basic conditions for the development of live broadcast business.
目前,处理有内容安全问题的历史直播视频处理方案是人工手动操作,在商家登记的直播信息中,查询到有需要下架的商品出现的直播视频资源,需要删除掉整个直播视频资源。At present, the historical live video processing solution with content security issues is manually operated. In the live broadcast information registered by the merchant, if a live video resource with a product that needs to be removed from the shelf is found, the entire live video resource needs to be deleted.
但是上述处理方案存在这样的技术问题:消费者会偶尔翻看直播的历史视频资源,核对当时的订单详情,比如赠品数量是否与直播时一致等情况,直接下掉整场直播资源,会将消费者想翻看的视频内容一起删除,从而造成不必要的视频内容的删除。。However, there is such a technical problem in the above solution: consumers will occasionally look through the historical video resources of the live broadcast, check the order details at that time, such as whether the number of gifts is consistent with the live broadcast, and directly download the entire live broadcast resources, which will consume The video content that the reader wants to browse is deleted together, thereby causing unnecessary deletion of video content. .
发明内容Contents of the invention
本申请为解决相关技术中存在的至少一个问题而提供一种多媒体数据处理方法及装置、设备、存储介质,能够贴合用户的实际需求,自动将多媒体数据中需要删除的多媒体数据片段删除。In order to solve at least one problem in the related art, this application provides a multimedia data processing method, device, equipment, and storage medium, which can meet the actual needs of users and automatically delete multimedia data segments that need to be deleted in multimedia data.
本申请的技术方案是这样实现的:The technical scheme of the present application is realized like this:
第一方面,本申请实施例提供一种多媒体数据处理方法,所述方法包括:In a first aspect, an embodiment of the present application provides a multimedia data processing method, the method comprising:
基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;Determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;Identifying the content of the multimedia data to be processed, and determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted;
将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据。Cutting out the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
第二方面,本申请实施例提供一种多媒体数据处理装置,包括:In a second aspect, an embodiment of the present application provides a multimedia data processing device, including:
确定单元,配置为基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;The determining unit is configured to determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
识别单元,配置为对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;The identification unit is configured to identify the content of the multimedia data to be processed, and determine the segment to be deleted in the multimedia data to be processed, and the content of the segment to be deleted includes the object corresponding to the label to be deleted;
裁剪单元,配置为将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据。The clipping unit is configured to clip the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
第三方面,本申请实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器运行所述计算机程序时实现上述多媒体数据处理方法中的步骤。In the third aspect, the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the above-mentioned multimedia data processing is realized when the processor runs the computer program steps in the method.
第四方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时实现上述多媒体数据处理方法中的步骤。In a fourth aspect, the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps in the above multimedia data processing method are implemented.
本申请实施例中,提供了一种多媒体数据处理方法及装置、设备、存储介质,包括:基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据;从而将标签包括待删除标签的多媒体数据中,内容包括待标签数据对应的对象的部分片段删除,仅保留不涉及待删除标签的片段。从而不需要在线上下架某一商品时,将涉及的商品包括该商品的视频整体进行删除,避免下架商品对其他商品正常展示带来影响。In the embodiment of the present application, a multimedia data processing method, device, device, and storage medium are provided, including: determining the multimedia data to be processed based on the tag to be deleted; the tag of the multimedia data to be processed includes the tag to be deleted; Identifying the content of the multimedia data to be processed, determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted; the segment to be deleted, from Cutting out the multimedia data to be processed to obtain the target multimedia data; thereby deleting the part of the multimedia data whose tag includes the tag to be deleted and whose content includes the object corresponding to the tag data, and only retaining the segments that do not involve the tag to be deleted. Therefore, when removing or removing a product from the online store, the products involved, including the video of the product, should be deleted as a whole, so as to avoid the impact of the removed product on the normal display of other products.
附图说明Description of drawings
图1为本申请实施例提供的多媒体数据处理系统的可选地架构示意图;FIG. 1 is a schematic diagram of an optional architecture of a multimedia data processing system provided in an embodiment of the present application;
图2为本申请实施例提供的多媒体数据处理系统的可选地架构示意图;FIG. 2 is a schematic diagram of an optional architecture of a multimedia data processing system provided in an embodiment of the present application;
图3为本申请实施例提供的多媒体数据处理方法的可选地流程示意图;FIG. 3 is an optional schematic flowchart of a multimedia data processing method provided in an embodiment of the present application;
图4为本申请实施例提供的多媒体数据裁剪的可选地效果示意图;FIG. 4 is a schematic diagram of an optional effect of multimedia data clipping provided by an embodiment of the present application;
图5为本申请实施例提供的多媒体数据处理系统的可选地结构示意图;FIG. 5 is a schematic structural diagram of an optional multimedia data processing system provided by an embodiment of the present application;
图6为本申请实施例提供的商家端的多媒体数据处理方法的可选地流程示意图;FIG. 6 is an optional schematic flowchart of a method for processing multimedia data at a merchant end provided by an embodiment of the present application;
图7为本申请实施例提供的直播档案信息的可选地接收示意图;FIG. 7 is a schematic diagram of optional reception of live file information provided by the embodiment of the present application;
图8为本申请实施例提供的商家端直播过程的可选地流程示意图;FIG. 8 is an optional flow diagram of the merchant-end live broadcast process provided by the embodiment of the present application;
图9为本申请实施例提供的商家端的多媒体数据处理方法的可选地流程示意图;FIG. 9 is an optional schematic flowchart of a method for processing multimedia data at a merchant end provided by an embodiment of the present application;
图10为本申请实施例提供的多媒体数据的可选地分段示意图;FIG. 10 is a schematic diagram of optional segmentation of multimedia data provided by the embodiment of the present application;
图11为本申请实施例提供的多媒体数据的可选地裁剪效果示意图;FIG. 11 is a schematic diagram of an optional clipping effect of multimedia data provided by an embodiment of the present application;
图12为本申请实施例提供的多媒体数据处理装置的可选地结构示意图;FIG. 12 is a schematic structural diagram of an optional multimedia data processing device provided in an embodiment of the present application;
图13为本申请实施例提供的电子设备的可选地结构示意图。FIG. 13 is a schematic structural diagram of an optional electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the specific technical solutions of the application will be further described in detail below in conjunction with the drawings in the embodiments of the present application. The following examples are used to illustrate the present application, but not to limit the scope of the present application.
本申请实施例可提供为多媒体数据处理方法及系统和存储介质。实际应用中,多媒体数据处理方法可由多媒体数据处理系统实现,多媒体数据处理系统中的各功能实体可以由电子设备(如终端设备或服务器)的硬件资源,如处理器等计算资源、通信资源(如用于支持实现光缆、蜂窝等各种方式通信)协同实现。Embodiments of the present application may provide a multimedia data processing method and system, and a storage medium. In practical applications, the multimedia data processing method can be realized by a multimedia data processing system, and each functional entity in the multimedia data processing system can be composed of hardware resources of an electronic device (such as a terminal device or a server), computing resources such as a processor, communication resources (such as It is used to support the realization of communication in various ways such as optical cable and cellular) and collaborative realization.
本申请实施例的多媒体数据处理方法可应用于图1所示的多媒体数据处理系统,包括:客户端10和服务端20,其中,客户端基于输入设备与用户进行交互,接收用户输入的待删除标签,其中,输入设备包括:显示器、鼠标、键盘等能够接收用户的输入信息的器件。The multimedia data processing method of the embodiment of the present application can be applied to the multimedia data processing system shown in FIG. 1, including: a client 10 and a server 20, wherein the client interacts with the user based on the input device, and receives the user input to be deleted. The label, wherein the input device includes: a display, a mouse, a keyboard and other devices capable of receiving user input information.
在一示例中,客户端10与服务端20分别位于不同的物理实体上,此时,服务端20通过网络30能够客户端10进行通信。In an example, the client 10 and the server 20 are respectively located on different physical entities, and at this time, the server 20 can communicate with the client 10 through the network 30 .
在一示例中,如图2所示,多媒体数据处理系统,还包括:多媒体数据采集端40,多媒体数据采集端40能够基于数据采集设备采集多媒体数据,并将采集的多媒体数据发送至服务端20。数据采集设备包括:摄像头、麦克风等能够进行数据采集的设备。In one example, as shown in Figure 2, the multimedia data processing system also includes: a multimedia data acquisition terminal 40, the multimedia data acquisition terminal 40 can collect multimedia data based on the data acquisition device, and send the collected multimedia data to the server 20 . Data collection equipment includes: cameras, microphones and other equipment capable of data collection.
客户端10基于输入设备接收用户数输入的待删除标签,并将待删除标签发送至服务端20,服务端基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据。The client terminal 10 receives the label to be deleted based on the number of users input by the input device, and sends the label to be deleted to the server 20, and the server determines the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the Tag to be deleted; identify the content of the multimedia data to be processed, determine the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the tag to be deleted; The segment is deleted and cut out from the multimedia data to be processed to obtain the target multimedia data.
在实际应用中,客户端10可为对服务端中存储的视频进行运营管理的运营端,多媒体数据采集端可为运行有直播应用程序的用于用户进行直播的直播端,服务端为直播应用程序提供服务的服务器。In practical applications, the client 10 can be an operation terminal that operates and manages videos stored in the server, the multimedia data collection terminal can be a live broadcast terminal running a live broadcast application program for users to perform live broadcast, and the server is a live broadcast application The server on which the program provides the service.
结合上述多媒体数据处理系统,本实施例提出一种多媒体数据处理方法,能够贴合用户的实际需求,自动将多媒体数据中需要删除的多媒体数据片段删除。In combination with the above multimedia data processing system, this embodiment proposes a multimedia data processing method that can meet the actual needs of users and automatically delete multimedia data segments that need to be deleted in the multimedia data.
下面,结合图1或图2所示的多媒体数据处理系统,对本申请实施例提供的多媒体数据处理方法、装置、设备和存储介质的各实施例进行说明。Next, with reference to the multimedia data processing system shown in FIG. 1 or FIG. 2 , various embodiments of the multimedia data processing method, device, device, and storage medium provided by the embodiments of the present application will be described.
本申请实施例提供一种多媒体数据处理方法。该方法所实现的功能可以通过电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该电子设备至少包括处理器和存储介质。An embodiment of the present application provides a multimedia data processing method. The functions realized by the method can be realized by calling the program codes by the processor in the electronic device, and of course the program codes can be stored in the computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.
图3为本申请实施例的一种多媒体数据处理方法的实现流程示意图,如图3所示,该方法可以包括如下步骤:Fig. 3 is a schematic diagram of the implementation flow of a multimedia data processing method in the embodiment of the present application. As shown in Fig. 3, the method may include the following steps:
S301、服务端基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签。S301. The server determines multimedia data to be processed based on the tag to be deleted; the tag of the multimedia data to be processed includes the tag to be deleted.
当用户需要对某一对象相关的多媒体数据进行删除时,可向客户端输入待删除标签,并在客户端上进行触发视频删除功能的屏蔽操作,当客户端接收到屏蔽操作时,生成屏蔽指令,并将屏蔽指令发送至服务端。其中,屏蔽指令中携带有待删除标签。When the user needs to delete the multimedia data related to a certain object, he can input the label to be deleted to the client, and perform a masking operation that triggers the video deletion function on the client. When the client receives the masking operation, a masking command is generated. , and send the blocking command to the server. Wherein, the masking instruction carries a tag to be deleted.
服务端接收到屏蔽指令后,对屏蔽指令进行解析,得到待删除标签。After receiving the shielding instruction, the server parses the shielding instruction to obtain the tag to be deleted.
服务端确定待删除标签后,确定待处理多媒体数据。待处理多媒体数据可为用户通过客户端指定的多媒体数据,也可至少一个多媒体数据中选取的多媒体数据。这里,多媒体数据的文件类型可为视频、音频等持续一段时间的数据。After the server determines the tag to be deleted, it determines the multimedia data to be processed. The multimedia data to be processed may be the multimedia data designated by the user through the client, or may be selected from at least one piece of multimedia data. Here, the file type of the multimedia data may be video, audio and other data that lasts for a period of time.
当待处理多媒体数据为用户通过客户端指定的多媒体数据,服务端判断指定的多媒体数据的标签中是否包括待删除标签,如果指定的多媒体数据的标签中包括待删除标签,则确定该指定的多媒体数据为待删除标签。When the multimedia data to be processed is the multimedia data specified by the user through the client, the server judges whether the label of the specified multimedia data includes the label to be deleted, and if the label of the specified multimedia data includes the label to be deleted, then determine the specified multimedia data The data is the label to be deleted.
在一示例中,指定的多媒体数据为多媒体数据A,且多媒体数据A的标签包括:标签1、标签2、标签3和标签4,当待删除标签为标签1,则多媒体数据A为待处理多媒体数据,当待删除标签为标签5,则多媒体数据A不是待处理多媒体数据。In an example, the designated multimedia data is multimedia data A, and the tags of multimedia data A include: tag 1, tag 2, tag 3, and tag 4. When the tag to be deleted is tag 1, then multimedia data A is multimedia data to be processed. data, when the label to be deleted is label 5, the multimedia data A is not the multimedia data to be processed.
当待处理视频为至少两个多媒体数据中选取的多媒体数据,则将至少两个多媒体数据中标签包括待删除标签的多媒体数据确定为待处理多媒体数据。When the video to be processed is multimedia data selected from the at least two multimedia data, the multimedia data whose tag includes the tag to be deleted in the at least two multimedia data is determined as the multimedia data to be processed.
在一示例中,至少两个多媒体数据包括:多媒体数据A、多媒体数据B和多媒体数据C,待删除标签为标签2,多媒体数据A的标签包括标签2,则多媒体数据A为待处理多媒体数据,多媒体数据B的标签不包括标签2,则多媒体数据B不是待处理多媒体数据,多媒体数据C的标签包括标签2,则多媒体数据C为待处理多媒体数据,此时,待处理多媒体数据包括多媒体数据A和多媒体数据C。In an example, at least two multimedia data include: multimedia data A, multimedia data B and multimedia data C, the label to be deleted is label 2, and the label of multimedia data A includes label 2, then multimedia data A is multimedia data to be processed, If the label of multimedia data B does not include label 2, multimedia data B is not multimedia data to be processed, and the label of multimedia data C includes label 2, then multimedia data C is multimedia data to be processed. At this time, multimedia data to be processed includes multimedia data A and multimedia data C.
本申请实施例中,服务端中可建立有标签列表,标签列表中包括各多媒体数据的标识和标签之间的关联关系,服务端基于标签列表能够确定标签包括待删除标签的待处理多媒体数据。其中,标签列表中的标签可为用户输入的。In the embodiment of the present application, a label list may be established in the server, and the label list includes the identification of each multimedia data and the association relationship between the labels. Based on the label list, the server can determine that the label includes the unprocessed multimedia data of the label to be deleted. Wherein, the tags in the tag list may be input by the user.
在一示例中,待处理多媒体数据为直播的直播视频,用户可在直播之前,直播端接收用户输入的直播档案信息,直播当前信息可包括本次直播的标签,直播端将用户输入的直播档案信息发送至服务端,服务端在基于用户的直播生成直播视频后,在直播档案信息和生成的直播视频的视频标识之间建立关联关系。In one example, the multimedia data to be processed is the live video of the live broadcast. Before the live broadcast, the user can receive the live file information input by the user at the live broadcast terminal. The current live information can include the label of this live broadcast. The information is sent to the server, and after the live video is generated based on the user's live broadcast, the server establishes an association between the live file information and the video ID of the generated live video.
本申请实施例中,待处理多媒体数据中的标签表征待处理多媒体数据中的内容所涉及的商品,此时,该待处理多媒体数据可关联多个商品链接,各商品链接指向的内容为商品购买页面,在商品购买页面中包括有商品信息,即基于商品链接能够得到各商品的商品信息,这里,将通过商品链接得到的商品信息作为该待处理多媒体数据的标签。In the embodiment of this application, the tags in the multimedia data to be processed represent the commodities involved in the content of the multimedia data to be processed. At this time, the multimedia data to be processed can be associated with multiple commodity links, and the content pointed to by each commodity link is commodity purchase. The page includes product information on the product purchase page, that is, the product information of each product can be obtained based on the product link. Here, the product information obtained through the product link is used as the label of the multimedia data to be processed.
在一示例中,待处理多媒体数据关联的商品链接包括:商品1的商品链接、商品2的商品链接和商品3的商品链接,则待处理多媒体数据的标签包括:商品1、商品2和商品3。In an example, the product links associated with the multimedia data to be processed include: product links of product 1, product links of product 2 and product links of product 3, and the tags of the multimedia data to be processed include: product 1, product 2 and product 3 .
本申请实施例中,当待处理多媒体数据为视频,待处理多媒体数据待可为从网络侧下载的视频, 比如:某电视剧的视频,也可为用户上传的视频流形成的直播视频,比如:用户A的直播视频。In the embodiment of the present application, when the multimedia data to be processed is a video, the multimedia data to be processed can be a video downloaded from the network side, such as: a video of a TV series, or a live video formed by a video stream uploaded by a user, such as: User A's live video.
S302、服务端对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象。S302. The server identifies the content of the multimedia data to be processed, and determines a segment to be deleted in the multimedia data to be processed, where the content of the segment to be deleted includes an object corresponding to the tag to be deleted.
服务端确定待处理多媒体数据后,调取待处理多媒体数据,并识别待处理多媒体数据中内容包括待删除标签对应的对象即待删除对象。本申请实施例中,待删除标签可表征人物、商品、建筑等任何能够出现在多媒体数据中的对象。在一示例中,待删除标签表征商品时,待删除标签可为商品的名称、库存量单位(Stock Keeping Unit,SKU)等商品信息。After the server determines the multimedia data to be processed, it retrieves the multimedia data to be processed, and identifies the content of the multimedia data to be processed as the object corresponding to the label to be deleted, that is, the object to be deleted. In the embodiment of the present application, the tag to be deleted may represent any object that can appear in the multimedia data, such as a person, a commodity, or a building. In an example, when the tag to be deleted represents a product, the tag to be deleted may be the name of the product, the stock keeping unit (Stock Keeping Unit, SKU) and other product information.
这里,多媒体数据的文件类型不同,待删除对象在多媒体数据中的表现形式不同。当多媒体数据的文件类型为视频,则待删除对象表现为视频中的图像内容,当多媒体数据的文件类型为音频,则待删除对象表现为音频中的音频内容。Here, the file types of the multimedia data are different, and the representation forms of the objects to be deleted in the multimedia data are different. When the file type of the multimedia data is video, the object to be deleted is the image content in the video, and when the file type of the multimedia data is audio, the object to be deleted is the audio content in the audio.
待删除片段为多媒体数据中内容包括待删除对象的片段。在一示例中,待删除对象人物A,待处理多媒体数据为视频A,视频A的时长为T1,且在视频A的时间段[t1,t2]内,视频内容包括人物A,则待删除片段为时间段[t1,t2]这段视频,其中,t1大于或等于0,t2小于或等于T1,。在一示例中,待删除对象人物A,待处理多媒体数据为音频B,音频B的时长为T2,且在音频B的时间段[t3,t4]内,音频内容包括人物A,则待删除片段为时间段[t3,t4]这段音频片段,其中,t3大于或等于0,t4小于或等于T2。The segment to be deleted is a segment whose content includes the object to be deleted in the multimedia data. In one example, the target character A to be deleted, the multimedia data to be processed is video A, the duration of video A is T1, and within the time period [t1, t2] of video A, the video content includes character A, then the segment to be deleted is the video of time period [t1, t2], where t1 is greater than or equal to 0, and t2 is less than or equal to T1. In an example, the target character A to be deleted, the multimedia data to be processed is audio B, the duration of audio B is T2, and within the time period [t3, t4] of audio B, the audio content includes character A, then the segment to be deleted is an audio segment of time period [t3, t4], wherein t3 is greater than or equal to 0, and t4 is less than or equal to T2.
本申请实施例中,对于一个待删除对象,一个待处理多媒体数据中可包括有一个或多个待删除片段。In the embodiment of the present application, for an object to be deleted, one or more segments to be deleted may be included in one piece of multimedia data to be processed.
本申请实施例中,待处理多媒体数据可存储于服务端,也可存储于服务端对应的存储端,此时,服务端从存储端中调取待处理多媒体数据。In the embodiment of the present application, the multimedia data to be processed may be stored in the server, or may be stored in a storage terminal corresponding to the server. At this time, the server retrieves the multimedia data to be processed from the storage terminal.
S303、服务端将所述待删除片段,从所述删除片段所属的待处理多媒体数据中裁剪掉,得到目标多媒体数据。S303. The server cuts the segment to be deleted from the multimedia data to be processed to which the segment to be deleted belongs, to obtain target multimedia data.
服务端确定待删除片段后,将待删除片段从待处理视频中裁减掉。After the server determines the segment to be deleted, it cuts the segment to be deleted from the video to be processed.
在一示例中,待处理多媒体数据为视频A,视频A的时长为T1,待删除片段为时间段[t1,t2]这段视频;当t1为0,则t2小于T1,此时,将时间段[t1,t2]这段视频从视频A中裁减掉,得到目标视频A,目标视频的视频内容为视频A的[t2,T1]这段时间内的视频内容;当t1大于0,则t2等于T1,此时,将时间段[t1,t2]这段视频从视频A中裁减掉,得到目标视频A,目标视频的视频内容为视频A的[0,t1]这段时间内的视频内容;当t1大于0,则t2小于T1,此时,将时间段[t1,t2]这段视频从视频A中裁减掉,得到目标视频A,目标视频的视频内容为视频A的[0,t1]与[t2,T1]这两段时间内的视频内容。In an example, the multimedia data to be processed is video A, the duration of video A is T1, and the segment to be deleted is the video of the time period [t1, t2]; when t1 is 0, then t2 is less than T1, at this time, the time Segment [t1, t2] is cut from video A to obtain target video A. The video content of the target video is the video content of [t2, T1] of video A; when t1 is greater than 0, then t2 It is equal to T1. At this time, the video of the time period [t1, t2] is cut from video A to obtain the target video A. The video content of the target video is the video content of video A during the period [0, t1]. ; When t1 is greater than 0, then t2 is less than T1. At this time, the video of the time period [t1, t2] is cut from video A to obtain the target video A. The video content of the target video is [0, t1 of video A ] and [t2, T1] the video content during these two periods.
本申请实施例提供的多媒体数据处理方法可应用于以下场景:The multimedia data processing method provided in the embodiment of the present application can be applied to the following scenarios:
场景一、一直播视频是用户A销售商品的直播视频,且该直播视频如4中(a)所示,在T0至T1所示的时间段内的视频片段401所销售的商品为商品A,在T1至T2所示的时间段内的视频片段402所销售的商品为商品B,在T2至T3所示的时间段内的视频片段403所销售的商品为商品C, 在T3至T4所示的时间段内的视频片段404所销售的商品为商品D,此时,需要下架商品B,则将视频片段402作为待删除片段从这段视频中删除,得到的目标视频如图4中的(b)所示。 Scenario 1. A live video is a live video of user A selling commodities, and this live video is shown in (a) in 4, and the commodity sold in the video segment 401 within the time period shown by T0 to T1 is commodity A. The commodity sold in the video segment 402 in the time period shown in T1 to T2 is product B, the product sold in the video segment 403 in the time period shown in T2 to T3 is product C, and in the time period shown in T3 to T4 The commodity sold by the video segment 404 in the time period is commodity D. At this time, if commodity B needs to be removed from the shelves, the video segment 402 is deleted from this video as the segment to be deleted, and the obtained target video is as shown in Figure 4 (b) shown.
场景二、一音频是对用户A、用户B、用户C和用户D的采访音频,当用户B不符合采访条件,需要将用户B的采访内容从该音频中删除时,将用户B采访对应的片段从该音频中删除,仅保留对用户A、用户C和用户D的采访内容。Scenario 2. Audio 1 is the interview audio of user A, user B, user C, and user D. When user B does not meet the interview conditions and needs to delete user B’s interview content from the audio, the user B’s interview corresponding Snippets are removed from this audio, and only the interviews with User A, User C, and User D remain.
本申请实施例中,提供了一种多媒体数据处理方法,基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据;从而将标签包括待删除标签的多媒体数据中,内容包括待标签数据对应的对象的部分片段删除,仅保留不涉及待删除标签的片段,从而不需要在线上下架某一商品时,将涉及的商品包括该商品的视频整体进行删除,避免下架商品对其他商品正常展示带来影响。In the embodiment of the present application, a multimedia data processing method is provided, which determines the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted; the content of the multimedia data to be processed Identifying and determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted; cutting the segment to be deleted from the multimedia data to be processed , to obtain the target multimedia data; thus, in the multimedia data whose tag includes the tag to be deleted, the content includes some fragments of the object corresponding to the tag data to be deleted, and only keep the fragments that do not involve the tag to be deleted, so that there is no need to put a certain product on the shelf , delete the product involved, including the video of the product as a whole, to avoid the impact of the removed product on the normal display of other products.
在一些实施例中,S301基于待删除标签,确定待处理多媒体数据的实施包括:In some embodiments, the implementation of S301 determining the multimedia data to be processed based on the tag to be deleted includes:
S3011、获取至少两个原始多媒体数据中每一所述原始多媒体数据的标签;S3011. Obtain a tag of each of the at least two original multimedia data;
S3012、将所述每一原始多媒体数据的标签和所述待删除标签进行比较;S3012. Compare the tag of each original multimedia data with the tag to be deleted;
S3013、将所述标签包括所述待删除标签的原始多媒体数据,确定为所述待处理多媒体数据。S3013. Determine the original multimedia data whose tag includes the tag to be deleted as the multimedia data to be processed.
本申请实施例中,客户端发送的屏蔽指令可为一键屏蔽指令,此时,服务端从至少两个原始多媒体数据中检索所有的待处理多媒体数据。In the embodiment of the present application, the masking instruction sent by the client may be a one-key masking instruction, and at this time, the server retrieves all multimedia data to be processed from at least two original multimedia data.
服务端在检索至少两个原始多媒体数据中的待处理多媒体数据时,获取各原始多媒体数据的标签,其中,一个原始多媒体数据包括一个或多个标签。When retrieving multimedia data to be processed among at least two original multimedia data, the server acquires tags of each original multimedia data, wherein one original multimedia data includes one or more tags.
本申请实施例中,原始多媒体数据的标签可为用户输入的,也可为服务端从原始多媒体数据的商品链接中获取的。In the embodiment of the present application, the tag of the original multimedia data may be input by the user, or may be obtained by the server from the product link of the original multimedia data.
在原始多媒体数据的标签为用户输入的情况下,用户可通过多媒体数据采集端上传原始多媒体数据时,在多媒体数据采集端中输入该多媒体数据的标签,使得将该多媒体数据的标签和原始多媒体数据一起发送至服务端。In the case that the label of the original multimedia data is input by the user, when the user uploads the original multimedia data through the multimedia data collection terminal, the label of the multimedia data is input in the multimedia data collection terminal, so that the label of the multimedia data and the original multimedia data sent together to the server.
在一示例中,数据采集端为用户的直播端,用户在直播之前,输入本次直播的直播档案信息,其中,直播档案信息可包括:直播标题、直播时间、首页图片、购物车商品列表等直播详细信息,其中,购物车商品列表中包括该次直播中需要销售的商品的商品信息。In one example, the data collection terminal is the user's live broadcast terminal. Before the live broadcast, the user inputs the live broadcast file information of this live broadcast, wherein the live broadcast file information may include: live broadcast title, live broadcast time, home page picture, shopping cart product list, etc. The detailed information of the live broadcast, wherein the product list of the shopping cart includes the product information of the products to be sold in the live broadcast.
服务端获取各原始多媒体数据的标签后,对于每一原始多媒体数据,执行以下处理:将原始多媒体数据的标签和待删除标签进行匹配,将标签包括待删除标签的原始多媒体数据确定为待处理多媒体数据。After the server obtains the tags of each original multimedia data, for each original multimedia data, perform the following processing: match the tags of the original multimedia data with the tags to be deleted, and determine the original multimedia data whose tags include the tags to be deleted as multimedia files to be processed data.
在一示例中,至少两个原始多媒体数据包括:多媒体数据A、多媒体数据B和多媒体数据C,待删除标签为标签2,多媒体数据A的标签包括标:标签1、签2和标签3,多媒体数据B的标签包括标签4、标签5,多媒体数据C的标签包括:标签2、标签5和标签6,则待处理多媒体数据包括 多媒体数据A和多媒体数据C。In one example, at least two original multimedia data include: multimedia data A, multimedia data B and multimedia data C, the label to be deleted is label 2, the label of multimedia data A includes labels: label 1, label 2 and label 3, multimedia The tags of data B include tag 4 and tag 5 , the tags of multimedia data C include tag 2 , tag 5 and tag 6 , and the multimedia data to be processed includes multimedia data A and multimedia data C.
在一些实施例中,在S301基于待删除标签,确定待处理多媒体数据之前,还实施以下步骤:In some embodiments, before S301 determines the multimedia data to be processed based on the tag to be deleted, the following steps are also implemented:
获取所述原始多媒体数据所属的账号下的至少一个商品链接;Obtain at least one commodity link under the account to which the original multimedia data belongs;
确定所述至少一个商品链接中每一所述商品链接针对的商品信息,为所述原始视频数据的标签。The commodity information targeted by each commodity link in the at least one commodity link is determined as a tag of the original video data.
本申请实施例中,可在原始多媒体数据满足设定条件的情况下,从该原始多媒体数据所属的账号下的商品链接来确定该原始多媒体数据的标签。In the embodiment of the present application, if the original multimedia data satisfies the set condition, the label of the original multimedia data can be determined from the product link under the account to which the original multimedia data belongs.
这里,设定条件可包括以下至少之一:Here, setting conditions may include at least one of the following:
条件一、原始多媒体数据的生成时间距离当前时间小于设定时间; Condition 1. The generation time of the original multimedia data is less than the set time from the current time;
条件二、该原始多媒体数据为该原始多媒体数据所属的账号下最新的多媒体数据。Condition 2: The original multimedia data is the latest multimedia data under the account to which the original multimedia data belongs.
在条件一中,设定时间可为24小时。In condition one, the set time may be 24 hours.
服务端确定待处理多媒体数据所属的账号,该账号可为上传待处理多媒体数据时所使用的账号,服务器确定该账号下的商品链接,并将商品链接所针对的商品的商品信息作为该待处理多媒体数据的标签。The server determines the account to which the multimedia data to be processed belongs. The account may be the account used when uploading the multimedia data to be processed. The server determines the product link under the account, and uses the product information of the product targeted by the product link as the pending account. Tags for multimedia data.
在一些实施例中,所述待处理多媒体数据的文件类型包括:视频,S302对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段的实施包括:In some embodiments, the file type of the multimedia data to be processed includes: video, S302 identifies the content of the multimedia data to be processed, and the implementation of determining the segment to be deleted in the multimedia data to be processed includes:
S3021、提取所述待处理多媒体数据的视频帧序列;S3021. Extract the video frame sequence of the multimedia data to be processed;
S3022、获取所述待删除标签对应的参考图像;S3022. Obtain a reference image corresponding to the tag to be deleted;
S3023、将所述参考图像和所述视频帧序列中的视频帧进行匹配,得到匹配结果;S3023. Match the reference image with the video frames in the video frame sequence to obtain a matching result;
S3024、根据所述匹配结果,确定裁剪起点帧和所述裁剪起点帧对应的裁剪终点帧;所述裁剪起点帧至所述裁剪终点帧之间的视频帧构成所述待删除片段。S3024. According to the matching result, determine the trimming start frame and the trimming end frame corresponding to the trimming start frame; the video frames between the trimming start frame and the trimming end frame constitute the segment to be deleted.
本申请实施例中,参考图像可为待删除标签对应的对象的商品主图。其中,商品主图是商品详情页展示给用户的主图,能够直接展示商品。In the embodiment of the present application, the reference image may be the main product image of the object corresponding to the tag to be deleted. Among them, the product main image is the main image displayed to the user on the product detail page, which can directly display the product.
服务器获取参考图像后,将视频帧序列中的视频帧和参考图像进行匹配,其中,匹配结果包括:第一视频帧和第二视频帧,第一视频帧为内容包括所述待删除标签对应的对象的视频帧,第二视频帧为内容不包括待删除标签对应的对象的视频帧。After the server acquires the reference image, it matches the video frame in the video frame sequence with the reference image, wherein the matching result includes: a first video frame and a second video frame, and the first video frame is the content corresponding to the tag to be deleted The video frame of the object, and the second video frame is a video frame of the object whose content does not include the label to be deleted.
本申请实施例中,可计算视频帧与参考图像的图像相似度,并确定与参考图像的图像相似度大于相似度阈值的视频帧为第一视频帧,确定与参考图像的图像相似度小于或等于相似度阈值的视频帧为第二视频帧。其中,图像相似度可通过图像之间的汉明距离来表示,本申请实施例中,对图像相似度的表示方式不进行任何限定。In the embodiment of the present application, the image similarity between the video frame and the reference image can be calculated, and the video frame whose image similarity with the reference image is greater than the similarity threshold is determined as the first video frame, and the image similarity with the reference image is determined to be less than or The video frame equal to the similarity threshold is the second video frame. Wherein, the image similarity can be represented by a Hamming distance between images, and in the embodiment of the present application, no limitation is imposed on the representation of the image similarity.
本申请实施例中,连续多个第一视频帧构成待删除片段,其中,待删除片段中的第一个视频帧为裁剪起点视频帧,待删除片段中的最后一个视频帧为裁剪终点视频帧。且在视频帧序列中,裁剪起点视频帧的前一个视频帧为第二视频帧,裁剪终点视频帧的后一个视频帧为第二视频帧。In the embodiment of the present application, a plurality of consecutive first video frames constitute the segment to be deleted, wherein the first video frame in the segment to be deleted is the clipping start video frame, and the last video frame in the segment to be deleted is the clipping end video frame . In addition, in the sequence of video frames, the video frame preceding the clipping start video frame is the second video frame, and the video frame following the clipping end video frame is the second video frame.
服务端可确定每一视频帧与参考图像之间的图像相似度,当一视频帧与参考图像的图像相似度大于相似度阈值,则该视频帧为第一视频帧,当一视频帧与参考图像的图像相似度小于或等于相似 度阈值,则该视频帧为第二视频帧。The server can determine the image similarity between each video frame and the reference image. When the image similarity between a video frame and the reference image is greater than the similarity threshold, the video frame is the first video frame. When a video frame and the reference image If the image similarity of the image is less than or equal to the similarity threshold, the video frame is the second video frame.
服务端还可估计一个商品的直播时间,间隔性的去查找视频中关于该产品的视频频段,从而减少图片识别的工作量。比如:检测到T1的视频帧为第一视频帧,则检测T1+t1的视频帧是否为第一视频帧,如果T1+t1的视频帧不是第一视频帧,则返回来检测T1+t1-t2(t2小于t1)的视频帧是否为第一视频帧;如果T1+t1的视频帧是第一视频帧,则检测T1+t1的视频帧的后一视频帧是否为第一视频帧,如果T1+t1的视频帧的后一视频帧不是第一视频帧,则T1+t1的视频帧确定为待删除片段的最后一个视频帧,如果T1+t1的视频帧的后一视频帧是第一视频帧,则继续检测T1+2*t1的视频帧是否为第一视频帧,直达检测到待删除片段的最后一个视频帧。这里,待删除片段的第一个视频帧可采用同待删除片段的第一个视频帧的检测方法。The server can also estimate the live broadcast time of a product, and periodically search for the video frequency band of the product in the video, thereby reducing the workload of image recognition. For example: if the video frame of T1 is detected as the first video frame, then detect whether the video frame of T1+t1 is the first video frame, if the video frame of T1+t1 is not the first video frame, return to detect T1+t1- Whether the video frame of t2 (t2 is less than t1) is the first video frame; If the video frame of T1+t1 is the first video frame, then detect whether the next video frame of the video frame of T1+t1 is the first video frame, if The video frame after the video frame of T1+t1 is not the first video frame, then the video frame of T1+t1 is determined to be the last video frame of the segment to be deleted, if the video frame after the video frame of T1+t1 is the first For the video frame, continue to detect whether the video frame of T1+2*t1 is the first video frame until the last video frame of the segment to be deleted is detected. Here, the same detection method as the first video frame of the segment to be deleted may be used for the first video frame of the segment to be deleted.
在一些实施例中,S3023将所述参考图像和所述视频帧序列中的视频帧进行匹配,得到匹配结果的实施包括:对于所述视频帧序列中的每一所述视频帧,确定所述视频帧中所包括内容对象的参考区域,并从所述视频帧中裁剪出所述参考区域,得到待匹配图像;确定所述待匹配图像和所述参考图像之间的内容相似度;将内容相似度大于设定相似度阈值的匹配图像所属的视频帧确定为第一视频帧;所述第一视频帧的内容包括所述待删除标签对应的对象;将所述相似度小于或等于相似度阈值的匹配图像所属的视频帧确定为第二视频帧,所述第二视频帧的内容不包括所述待删除标签对应的对象。In some embodiments, S3023 matches the reference image with video frames in the sequence of video frames, and the implementation of obtaining the matching result includes: for each video frame in the sequence of video frames, determining the The reference area of the content object included in the video frame, and cut out the reference area from the video frame to obtain the image to be matched; determine the content similarity between the image to be matched and the reference image; The video frame to which the matching image whose similarity is greater than the set similarity threshold is determined as the first video frame; the content of the first video frame includes the object corresponding to the label to be deleted; the similarity is less than or equal to the similarity The video frame to which the image matching the threshold belongs is determined as the second video frame, and the content of the second video frame does not include the object corresponding to the tag to be deleted.
服务端中设置有目标检测模型,服务端针对需要判断是够为第一视频帧的视频帧执行以下处理:A target detection model is set in the server, and the server performs the following processing on the video frame that needs to be judged to be the first video frame:
将视频帧作为目标检测模型的输入,得到目标检测模型输出的视频帧中所包括的内容对象的位置,并基于目标检测模型输出的位置从视频帧中裁剪出内容对象的位置的区域即参考区域,得到该视频帧的待匹配图像;服务端将待匹配图像和参考图像之间的图像相似度,得到该视频帧与参考图像之间的图像相似度。The video frame is used as the input of the target detection model to obtain the position of the content object included in the video frame output by the target detection model, and based on the output position of the target detection model, the area of the position of the content object is cut out from the video frame, that is, the reference area , to obtain the image to be matched of the video frame; the server calculates the image similarity between the image to be matched and the reference image to obtain the image similarity between the video frame and the reference image.
本申请实施例中,服务端中的目标检测模型采用的算法可包括快速区域卷积神经网络(Faster region-convolution neural network,Faster R-CNN)、单词多框检测器(Single Shot MultiBox Detector,SSD)等目标检测算法,本申请实施例中,对目标检测模型所采用的目标检测算法不进行任何限定。In the embodiment of the present application, the algorithm adopted by the target detection model in the server may include Faster region-convolution neural network (Faster R-CNN), word multi-box detector (Single Shot MultiBox Detector, SSD ) and other target detection algorithms, in the embodiment of the present application, the target detection algorithm adopted by the target detection model is not limited in any way.
本申请实施例中,服务端还可设置图像分割模型,基于图像分割模型从视频帧中确定出参考区域。图像分割模型采用的图像分割算法可包括:区域生长、均值迭代分割、最大熵分割等图像分割算法,本申请实施例中,对图像分割模型所采用的图像分割算法不进行任何限定。In the embodiment of the present application, the server can also set an image segmentation model, and determine the reference area from the video frame based on the image segmentation model. The image segmentation algorithm adopted by the image segmentation model may include image segmentation algorithms such as region growing, mean value iterative segmentation, and maximum entropy segmentation. In the embodiment of the present application, the image segmentation algorithm adopted by the image segmentation model is not limited in any way.
本申请实施例中,在计算待匹配图像和参考图像的图像相似度时,可以待匹配图像或参考图像为目标图像进行以下处理:将目标图像进行缩小至设定尺寸,对缩小后的目标图像进行灰度处理,并计算灰度处理后的目标图像的哈希值,此时,计算待匹配图像的哈希值和参考图像的哈希值之间的相似度,得到待匹配图像的哈希值和参考图像的图像相似度。在一示例中,设定尺寸为9*8。In the embodiment of the present application, when calculating the image similarity between the image to be matched and the reference image, the image to be matched or the reference image can be used as the target image to perform the following processing: the target image is reduced to a set size, and the reduced target image Perform grayscale processing and calculate the hash value of the target image after grayscale processing. At this time, calculate the similarity between the hash value of the image to be matched and the hash value of the reference image to obtain the hash value of the image to be matched value and the image similarity of the reference image. In an example, the set size is 9*8.
在一些实施例中,S3024根据所述匹配结果,确定裁剪起点和所述裁剪起点对应的裁剪终点的实施包括:In some embodiments, the implementation of S3024 determining the clipping start point and the clipping end point corresponding to the clipping start point according to the matching result includes:
确定相邻的前一帧属于所述第二视频帧但自身属于所述第一视频帧的视频帧确定为所述裁剪起 点帧;所述第一视频帧的内容包括所述待删除标签对应的对象;所述第二视频帧的内容不包括所述待删除标签对应的对象;It is determined that the adjacent previous frame belongs to the second video frame but the video frame that itself belongs to the first video frame is determined as the cropping start frame; the content of the first video frame includes the corresponding Object; the content of the second video frame does not include the object corresponding to the label to be deleted;
将相邻的后一帧属于所述第二视频帧但自身属于所述第一视频帧的视频帧确定为所述裁剪终点帧;所述裁剪起点帧和所述裁剪终点帧之间不包括属于所述第二视频帧的视频帧。Determine the video frame whose next adjacent frame belongs to the second video frame but itself belongs to the first video frame as the clipping end frame; the clipping start frame and the clipping end frame do not include A video frame of the second video frame.
在一些实施例中,S303将所述待删除片段,从所述删除片段所属的待处理多媒体数据中裁剪掉,得到目标多媒体数据的实施包括:In some embodiments, S303 cuts the segment to be deleted from the multimedia data to be processed to which the deleted segment belongs, and the implementation of obtaining the target multimedia data includes:
基于所述裁剪起点帧的前一帧视频帧和所述裁剪终点帧的后一帧视频帧的拼接,对所述待删除片段之前的片段和所述待删除片段之后的片段进行合并,得到所述目标多媒体数据。Based on the splicing of the previous video frame of the clipping start frame and the subsequent video frame of the clipping end frame, the segment before the segment to be deleted and the segment after the segment to be deleted are merged to obtain the Describe the target multimedia data.
服务端确定待删除片段后,将视频帧序列中的第二视频帧基于连续性拼接在一起,即将连续的第二视频拼接为一个待保留片段,其中,待删除片段之前的待保留片段的最后一个视频帧为该待删除片段的裁剪起点帧的前一帧视频,该待删除片段之后的待保留片段的第一个视频帧为该待删除片段的裁剪终点帧的后一帧视频,将一个待删除片段的裁剪起点帧的前一帧视频帧和裁剪终点帧的后一视频帧进行拼接,则将该待删除片段之前的待保留片段和之后的待保留片段进行合并。这里,将待处理多媒体数据中所有待删除片段的裁剪起点帧的前一帧视频帧和裁剪终点帧的后一视频帧进行拼接,得到目标视频。After the server determines the segment to be deleted, the second video frame in the sequence of video frames is spliced together based on continuity, that is, the continuous second video is spliced into a segment to be retained, wherein the last segment of the segment to be retained before the segment to be deleted A video frame is the previous frame video of the cutting start frame of the segment to be deleted, and the first video frame of the segment to be retained after the segment to be deleted is the next frame video of the trimming end frame of the segment to be deleted. The previous video frame of the cutting start frame of the segment to be deleted is spliced with the next video frame of the trimming end frame, and the segment to be reserved before the segment to be deleted is merged with the segment to be retained after. Here, the video frame before the clipping start frame and the next video frame after the clipping end frame of all the clips to be deleted in the multimedia data to be processed are spliced to obtain the target video.
本申请实施例中,待删除片段可为待处理多媒体数据的开始位置,或结尾位置,也可位于待处理多媒体数据的中间位置。这里,可仅对位于待处理多媒体数据的中间位置的待删除片段的裁剪起点帧的前一帧视频帧和裁剪终点帧的后一视频帧进行拼接。In the embodiment of the present application, the segment to be deleted may be a start position or an end position of the multimedia data to be processed, or may be located in a middle position of the multimedia data to be processed. Here, splicing may only be performed on the video frame preceding the cropping start frame and the subsequent video frame of the cropping end frame of the segment to be deleted located in the middle of the multimedia data to be processed.
在一些实施例中,在S303将所述待删除片段,从所述删除片段所属的待处理多媒体数据中裁剪掉,得到目标多媒体数据之后,还实施以下步骤:In some embodiments, after cutting the segment to be deleted from the multimedia data to be processed to which the segment to be deleted belongs in S303 to obtain the target multimedia data, the following steps are further implemented:
获取所述目标多媒体数据对应的待处理多媒体数据的存储路径;Acquiring a storage path of the multimedia data to be processed corresponding to the target multimedia data;
将所述目标多媒体数据存储至所述存储路径,以替换所述目标多媒体数据对应的待处理多媒体数据。storing the target multimedia data in the storage path to replace unprocessed multimedia data corresponding to the target multimedia data.
服务端将存储的待处理多媒体数据替换为该待处理多媒体数据处理后得到的目标多媒体数据。The server replaces the stored multimedia data to be processed with the target multimedia data obtained after processing the multimedia data to be processed.
本申请实施例中,当待处理多媒体数据存储于服务端之外的存储端,可判断目标多媒体数据的数据量的大小,当目标多媒体数据的数据量的大小大于设定数据量,可将目标多媒体数据划分为多个数据块,将多个数据块上传至存储端。此时,存储端将多个数据块进行拼接,得到目标多媒体数据,并由目标多媒体数据替换原来的待处理多媒体数据。In the embodiment of the present application, when the multimedia data to be processed is stored in a storage terminal other than the server terminal, the data volume of the target multimedia data can be judged, and when the data volume of the target multimedia data is greater than the set data volume, the target multimedia data can be The multimedia data is divided into multiple data blocks, and the multiple data blocks are uploaded to the storage terminal. At this time, the storage end splices multiple data blocks to obtain the target multimedia data, and replaces the original multimedia data to be processed with the target multimedia data.
下面,以多媒体数据为视频为例,本申请实施例提供的多媒体数据处理方法进行进一步说明。In the following, taking multimedia data as video as an example, the multimedia data processing method provided in the embodiment of the present application will be further described.
本申请实施例提供的多媒体数据处理系统如图5所示,包括:商家端501、服务端502和运营端503。As shown in FIG. 5 , the multimedia data processing system provided by the embodiment of the present application includes: a merchant terminal 501 , a server terminal 502 and an operator terminal 503 .
本申请实施例中,如图5所示,商家端501用于建立直播档案信息,并生成直播视频文件的视频数据流,将直播档案信息和视频数据流发送至服务端502。其中,如图6所示,商家端501执行 以下处理:In the embodiment of the present application, as shown in FIG. 5 , the merchant terminal 501 is used to establish live file information, generate a video data stream of a live video file, and send the live file information and video data stream to the server 502 . Wherein, as shown in Figure 6, the merchant terminal 501 performs the following processing:
S6011、商家端接收商家填写的直播档案信息,并将直播档案信息发送至服务端。S6011. The merchant receives the live file information filled in by the merchant, and sends the live file information to the server.
这里,直播档案信息存储在服务端502的直播档案库中。Here, the live archive information is stored in the live archive database of the server 502 .
本申请实施例中,如图7所示,商家端501提供有直播管理后台701,商家可通过直播管理后台701填写直播档案信息702,从而录入直播档案信息702,直播档案信息702可包括:直播标题7021、直播时间7022、首页图片7023、购物车商品列表7024等直播详细信息,其中,购物车商品列表7024中包括该次直播中需要销售的商品的信息。其中,购物车商品列表7024可作为检索待处理视频的依据。In the embodiment of this application, as shown in Figure 7, the merchant terminal 501 provides a live broadcast management background 701. The merchant can fill in the live file information 702 through the live broadcast management background 701, thereby entering the live file information 702. The live file information 702 may include: Title 7021 , live broadcast time 7022 , home page picture 7023 , shopping cart product list 7024 and other live broadcast detailed information, wherein the shopping cart product list 7024 includes information about products to be sold in the live broadcast. Among them, the shopping cart product list 7024 can be used as a basis for retrieving videos to be processed.
S6012、商家端进行直播。S6012. The merchant end broadcasts live.
这里,将直播过程的视频流发送至服务端502,服务端基于接收到的直播的视频流生成直播的视频文件,并存储至服务端502的直播视频库中。Here, the video stream of the live broadcast process is sent to the server 502, and the server generates a live video file based on the received live video stream, and stores it in the live video library of the server 502.
商家端在直播时,可执行如图8所示的流程:When the merchant is live broadcasting, it can execute the process shown in Figure 8:
S801、商家端采集图像数据;S801. The merchant terminal collects image data;
商家端可通过图像采集设备采集图像数据。The merchant end can collect image data through the image acquisition device.
S802、商家端对采集的图像数据进行图像处理;S802. The merchant terminal performs image processing on the collected image data;
其中,图像处理可包括:美颜、滤镜等处理。Wherein, the image processing may include: beautification, filter and other processing.
S803、商家端对经过图像处理的图像数据进行压缩。S803. The merchant end compresses the image data that has undergone image processing.
商家端对经过图像处理的图像数据进行编码压缩。The merchant end encodes and compresses the image data after image processing.
S804、商家端对经过压缩的图像数据以视频流的方式传输至服务端。S804. The merchant end transmits the compressed image data to the server end in the form of a video stream.
其中,商家端将经过压缩的图像数据通过实时消息传输协议(Real Time Messaging Protocol,RTMP)上传至服务端。Among them, the merchant side uploads the compressed image data to the server side through the Real Time Messaging Protocol (RTMP).
其中,服务端502的直播档案库(未示出)中建立直播档案信息和直播视频信息的视频存储信息之间建立关联关系。其中,视频存储信息包括:视频名称、视频存储地址等。Wherein, an association relationship is established between the live archive information and the video storage information of the live video information in the live archive database (not shown) of the server 502 . Wherein, the video storage information includes: a video name, a video storage address, and the like.
运营端503执行以下处理:运营端接收需要屏蔽的商品的SKU。The operation terminal 503 performs the following processing: the operation terminal receives the SKU of the commodity that needs to be blocked.
需要屏蔽某一商品的视频数据时,用户在运营端503中输入商品的SKU,并点击一键屏蔽功能,此时,运营端503接收到用户的输入的商品的SKU,且开启视频自动屏蔽功能。其中,视频自动屏蔽功能的开启触发,向服务端502发送屏蔽指令,且发送的屏蔽指令中包括接收到的需要屏蔽的商品的SKU。此时,如图5所示,运营端503将需要屏蔽的商品的SKU发送至服务端502。When the video data of a product needs to be blocked, the user enters the SKU of the product in the operating terminal 503 and clicks the one-key shielding function. At this time, the operating terminal 503 receives the SKU of the product input by the user and enables the automatic video blocking function . Wherein, the activation of the automatic video shielding function is triggered, and a shielding instruction is sent to the server 502, and the sent shielding instruction includes the received SKU of the product to be shielded. At this time, as shown in FIG. 5 , the operator 503 sends the SKU of the product to be masked to the server 502 .
服务端502接收到屏蔽指令后,如图9所示,执行以下处理:After the server 502 receives the masking instruction, as shown in FIG. 9 , it performs the following processing:
S5021、服务端基于需要屏蔽商品的SKU检索待处理视频。S5021. The server retrieves the video to be processed based on the SKU of the commodity that needs to be blocked.
服务端502基于需要屏蔽商品的SKU在直播档案库中进行检索,检索需要进行屏蔽处理的待处理视频。The server 502 searches the live broadcast archives based on the SKU of the commodity that needs to be masked, and retrieves the videos to be processed that need to be masked.
这里,可基于待处理视频的视频信息生成待处理视频列表。Here, the video list to be processed may be generated based on video information of the videos to be processed.
服务端502以运营端503输入的要屏蔽的SKU去直播档案库的购物车商品列表中进行精确匹 配,不包含SKU的为不需要处理的视频,包含SKU的为需要处理的待处理视频。服务端将包含SKU的待处理视频提取到待处理视频库,将待处理视频库中的视频作为视频剪切的输入,以与需要屏蔽商品的SKU主图即参考图像进行相似度识别检测。The server 502 uses the SKU to be blocked input by the operator 503 to perform an exact match in the shopping cart product list of the live broadcast archive. Those that do not contain SKUs are videos that do not need to be processed, and those that contain SKUs are videos that need to be processed. The server extracts the to-be-processed video containing the SKU to the to-be-processed video library, and uses the video in the to-be-processed video library as the input of video clipping to identify and detect the similarity with the main image of the SKU that needs to be masked, that is, the reference image.
S5022、服务端基于需要屏蔽商品的SKU确定需要屏蔽商品的SKU主图。S5022. The server determines, based on the SKUs of the commodities that need to be shielded, the main image of the SKUs of the commodities that need to be shielded.
当运营端503向服务端502输入需要屏蔽的SKU后,服务端502会根据SKU去主站查询商品主图,作为对视频进行图像相似性识别的模型图片。When the operator 503 inputs the SKU to be masked to the server 502, the server 502 will go to the main website to query the main picture of the product according to the SKU, as a model picture for image similarity recognition on the video.
S5023、服务端基于需要屏蔽商品的SKU主图对待处理视频的每一帧进行图片相似度识别,识别出需要屏蔽SKU的视频片段。S5023. The server performs image similarity recognition on each frame of the video to be processed based on the main image of the SKU of the product that needs to be blocked, and identifies the video segment that needs to block the SKU.
S5024、服务端剪辑待处理视频中识别出的需要屏蔽的视频片段即待删除片段。S5024. The identified video segment that needs to be masked in editing the video to be processed by the server is the segment to be deleted.
服务端依次剪辑待处理视频列表中每个待处理视频中识别到的需要屏蔽的视频片段的数据帧。对于每一个待处理视频,将需要屏蔽的视频片段中的数据帧全部剪辑掉,并在剪辑完成再进行视频合并,组合成一个完整的视频即目标视频。The server sequentially edits the data frames of the identified video segments that need to be masked in each video to be processed in the video list to be processed. For each video to be processed, all the data frames in the video segment to be masked are clipped, and after the clipping is completed, the video is merged to form a complete video, that is, the target video.
S5025、服务端保存目标视频。S5025. The server saves the target video.
将处理好的视频上传回直播视频库中原有视频的链接。Upload the processed video back to the original video link in the live video library.
在S5023中,服务端通过图片相似度识别算法来识别需要屏蔽商品的SKU主图与待处理视频的每一帧的相似度。In S5023, the server uses an image similarity recognition algorithm to identify the similarity between the SKU main image of the product to be masked and each frame of the video to be processed.
本申请实施例中的图片相似度识别算法可为通过目标检测模型faster-rcnn和dHash算法完成。The image similarity recognition algorithm in the embodiment of the present application can be completed by the target detection model faster-rcnn and the dHash algorithm.
这里,通过faster-rcnn识别视频中的所有商品图片,将图片剪切保存为相似度识别算法输入图片,其中,商品图片的识别包括以下步骤:Here, all product pictures in the video are identified by faster-rcnn, and the pictures are cut and saved as input pictures for the similarity recognition algorithm. The identification of product pictures includes the following steps:
S5231A、通过faster-rcnn对大小的P*Q的图像,预先缩放至大小M*N;S5231A, pre-scale the image of size P*Q to size M*N through faster-rcnn;
S5232A、通过faster-rcnn的特征提取层对输入图像进行特征图的提取。S5232A. Extract the feature map of the input image through the feature extraction layer of the faster-rcnn.
将缩放的图像输入至特征提取层(Conv layers),特征提取层中包含卷积(conv)层、激活(relu)层以及池化(pooling)层。特征提取层用于提取输入图像的特征图。The scaled image is input to the feature extraction layer (Conv layers), which includes a convolution (conv) layer, an activation (relu) layer, and a pooling (pooling) layer. The feature extraction layer is used to extract feature maps of the input image.
S5233A、通过faster-rcnn的区域候选网络确定输入图像的特征图中的候选区域。S5233A. Determine candidate regions in the feature map of the input image through the region candidate network of faster-rcnn.
该区域候选网络层通过softmax判断特征图中锚点(anchors)属于前景还是背景,再利用边界框回归算法(bounding box regression)修正锚点获得精确的候选区域(proposals)。The region candidate network layer uses softmax to judge whether the anchors in the feature map belong to the foreground or the background, and then uses the bounding box regression algorithm (bounding box regression) to modify the anchor points to obtain accurate candidate regions (proposals).
S5233A、通过faster-rcnn的感兴趣区域(region of interest,ROI)池化层,基于候选区域和输入图像的特征图,得到候选区域特征。S5233A. Through the region of interest (ROI) pooling layer of the faster-rcnn, based on the feature map of the candidate region and the input image, the feature of the candidate region is obtained.
这里,感兴趣区域池化层利用proposals从feature maps中提取候选区域特征(proposal feature maps)。Here, the ROI pooling layer uses proposals to extract candidate region features (proposal feature maps) from feature maps.
S5234A、通过faster-rcnn的分类层基于候选区域特征对各候选区域进行分类。S5234A. Using the classification layer of faster-rcnn to classify each candidate region based on the feature of the candidate region.
这里,分类层可包括全连接和softmax网络,通过全连接和softmax网络对提取的候选区域特征对各个候选区域进行分类,并将分类为需要屏蔽的SKU的候选区域标识出来。Here, the classification layer may include a fully connected and softmax network, through which the extracted candidate region features are used to classify each candidate region, and the candidate regions classified as SKUs that need to be masked are identified.
S5235A、通过faster-rcnn确定类型为SKU的proposal feature maps的位置。S5235A. Determine the position of the proposal feature maps whose type is SKU through faster-rcnn.
再次通过边界框回归算法获得类型为SKU的检测框最终的精确位置,即各SKU的图片坐标。The final precise position of the detection frame of type SKU is obtained through the bounding box regression algorithm again, that is, the image coordinates of each SKU.
将通过faster-rcnn得到的图片坐标对视频帧进行剪切,得到待匹配图像,将各待匹配图像和SKU主图进行dHash相似度对比。其中,针对一待匹配图像,相似度比较包括以下步骤:The image coordinates obtained by faster-rcnn are cut to the video frame to obtain the image to be matched, and the dHash similarity is compared between each image to be matched and the SKU main image. Wherein, for an image to be matched, the similarity comparison includes the following steps:
S5231B、将图片缩小为72个像素点;S5231B, reducing the picture to 72 pixels;
这里,将图片缩小为9*8即72个像素点。Here, the image is reduced to 9*8 or 72 pixels.
S5232B、对缩小后的图片进行灰度处理;S5232B. Perform grayscale processing on the reduced image;
S5233B、比较每行的左右两个像素,计算图片的哈希值;S5233B. Comparing the left and right pixels of each row, and calculating the hash value of the picture;
这里,对于每一行,计算相邻的两个像素之间的差异值,得到8个差异值,对于8行,则得到64个差异值即哈希值。Here, for each row, the difference value between two adjacent pixels is calculated to obtain 8 difference values, and for 8 rows, 64 difference values or hash values are obtained.
S5234B、计算待匹配图像和SKU主图之间的汉明距离。S5234B. Calculate the Hamming distance between the image to be matched and the main image of the SKU.
通过待匹配图像的哈希值和SKU主图的哈希值计算待匹配图片和SKU主图之间的汉明距离,这里,将计算的汉明距离作为待匹配图像和SKU主图之间的相似度,其中,汉明距离越小,表征两个图片越相似,汉明距离越大,表征两个图片越不相似。The Hamming distance between the image to be matched and the main image of the SKU is calculated by the hash value of the image to be matched and the hash value of the main image of the SKU. Here, the calculated Hamming distance is used as the distance between the image to be matched and the main image of the SKU Similarity, where the smaller the Hamming distance, the more similar the two pictures are, and the larger the Hamming distance is, the less similar the two pictures are.
此时,对视频中的每帧画面通过以上图像相似度识别算法,找到相似度高达90%以上的图片,视为需要屏蔽的数据帧。At this time, through the above image similarity recognition algorithm for each frame in the video, a picture with a similarity of more than 90% is found, which is regarded as a data frame that needs to be masked.
在S5024中,将数据帧按照时间区间做记录,例如[t1,t2],[t3,t4]依次类推。时间t1代表要屏蔽商品出现的第一帧,t2代表此连续时间段内商品出现的最后一帧。同理,[t3,t4]为商品出现在下一时间间隔内的开始结束时间。In S5024, the data frames are recorded according to time intervals, for example [t1, t2], [t3, t4] and so on. Time t1 represents the first frame in which the product to be blocked appears, and t2 represents the last frame in which the product appears in this continuous time period. Similarly, [t3,t4] is the start and end time when the commodity appears in the next time interval.
在一示例中,如图10所示,识别出的内容为输入的SKU的待删除片段1001包括:片段1、片段2、片段3、片段4,其中,t1和t2分别是片段1的第一帧和最后一帧在待处理视频中出现的时间,t3和t4分别是片段2的第一帧和最后一帧在待处理视频中出现的时间,t5和t6分别是片段3的第一帧和最后一帧在待处理视频中出现的时间,t7和t8分别是片段4的第一帧和最后一帧在待处理视频中出现的时间。In an example, as shown in FIG. 10 , the identified segment 1001 to be deleted whose content is the input SKU includes: segment 1, segment 2, segment 3, and segment 4, where t1 and t2 are the first frame and the last frame appear in the video to be processed, t3 and t4 are the time when the first frame and the last frame of segment 2 appear in the video to be processed, respectively, and t5 and t6 are the first frame and the last frame of segment 3, respectively The time when the last frame appears in the video to be processed, t7 and t8 are respectively the time when the first frame and the last frame of segment 4 appear in the video to be processed.
在S5024中,对待处理视频进行剪切合并。In S5024, cut and merge the video to be processed.
本申请实施例中,待处理视频经过数据压缩存储在磁盘上,是以二进制的文件形式进行存储。In the embodiment of the present application, the video to be processed is stored on the disk after data compression, and is stored in the form of a binary file.
在一示例中,如图11所示,待处理视频的时长为t4,出现待屏蔽商品的待删除片段包括:时间范围为[t1,t2]的片段1、时间范围为[t3,t4]的片段2,片段1的二进制编码为:10111,片段2的编码为:11011,这里,基于视频剪切指令分别基于时间范围[t0,t1)、(t2,t3)内的视频帧得到两个视频文件V1、V2,且视频文件V1的时长为tv1,视频文件V1的编码为:110111,视频文件V2的时长为tv2,视频文件V2的编码为:011,基于视频合并命令将视频文件V1和视频文件V2进行合并,得到新的视频文件V3,视频文件V3的编码为:110111011。In one example, as shown in Fig. 11, the duration of the video to be processed is t4, and the segment to be deleted with the product to be blocked includes: segment 1 with a time range of [t1, t2], segment 1 with a time range of [t3, t4] Fragment 2, the binary code of fragment 1 is: 10111, the code of fragment 2 is: 11011, here, based on the video cutting instruction, two videos are obtained based on the video frames in the time range [t0, t1), (t2, t3) respectively Files V1 and V2, and the duration of video file V1 is tv1, the encoding of video file V1 is: 110111, the duration of video file V2 is tv2, and the encoding of video file V2 is: 011, video file V1 and video The files V2 are merged to obtain a new video file V3, and the encoding of the video file V3 is: 110111011.
在S5025中,可将压缩完成的数据切块为数据包,依次分段上传,并将分段上传的数据包拼接完整后,存储完整的视频文件,上传到原视频资源的存储路径上。In the S5025, the compressed data can be cut into data packets, which are uploaded sequentially in segments, and after the segmented uploaded data packets are spliced, the complete video file is stored and uploaded to the storage path of the original video resource.
本申请实施例提供的多媒体数据处理方法,具有以下特点:The multimedia data processing method provided by the embodiment of the present application has the following characteristics:
1、通过对直播数据每帧画面进行图像识别,识别出直播的商品列表,利用了已有数据信息(商家填写的SKU信息)在视频库中自动检索需要屏蔽的视频,并对需要处理的视频进行自动剪切、合并、上传,不需要人工搜索,自动化保证了直播内容的安全性,是一种视频屏蔽自动化方案,解决了庞大直播历史视频资源无法按照商品进行一键自动屏蔽的问题。1. By performing image recognition on each frame of the live broadcast data, the product list of the live broadcast is identified, and the existing data information (SKU information filled in by the merchant) is used to automatically retrieve the videos that need to be blocked in the video library, and the videos that need to be processed Automatic cutting, merging, and uploading do not require manual search, and automation ensures the safety of live content. It is an automated video shielding solution that solves the problem that huge live historical video resources cannot be automatically blocked by one click according to the product.
2、通过图像识别算法,不会盲目下掉整个视频资源,可以准确的找到需要下线的视频片段,能够下线掉所有需要屏蔽的视频片段,将视频剪切合并后替换原有视频数据。2. Through the image recognition algorithm, the entire video resource will not be downloaded blindly, and the video clips that need to be offline can be accurately found, all video clips that need to be blocked can be offline, and the original video data will be replaced after cutting and merging the videos.
图12为本申请实施例的一种多媒体数据处理装置的结构示意图,如图12所示,多媒体数据处理装置1200包括:FIG. 12 is a schematic structural diagram of a multimedia data processing device according to an embodiment of the present application. As shown in FIG. 12 , the multimedia data processing device 1200 includes:
确定单元1201,配置为基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;The determining unit 1201 is configured to determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
识别单元1202,配置为对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;The identification unit 1202 is configured to identify the content of the multimedia data to be processed, and determine the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted;
裁剪单元1203,配置为将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据。The clipping unit 1203 is configured to clip the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
在一些实施例中,确定单元1201,还配置为:In some embodiments, the determining unit 1201 is further configured to:
获取至少两个原始多媒体数据中每一所述原始多媒体数据的标签;Obtaining tags of each of the at least two original multimedia data;
将所述每一原始多媒体数据的标签和所述待删除标签进行比较;comparing the label of each original multimedia data with the label to be deleted;
将所述标签包括所述待删除标签的原始多媒体数据,确定为所述待处理多媒体数据。Determining the original multimedia data whose tag includes the tag to be deleted as the multimedia data to be processed.
在一些实施例中,装置1200还包括:标签获取单元,配置为:In some embodiments, the device 1200 also includes: a tag acquisition unit configured to:
获取所述原始多媒体数据所属的账号下的至少一个商品链接;Obtain at least one commodity link under the account to which the original multimedia data belongs;
确定所述至少一个商品链接中每一所述商品链接针对的商品信息,为所述原始视频数据的标签。The commodity information targeted by each commodity link in the at least one commodity link is determined as a tag of the original video data.
在一些实施例中,识别单元1202,还配置为:In some embodiments, the identification unit 1202 is further configured to:
所述待处理多媒体数据的文件类型包括视频的情况下,提取所述待处理多媒体数据的视频帧序列;When the file type of the multimedia data to be processed includes video, extracting the video frame sequence of the multimedia data to be processed;
获取所述待删除标签对应的参考图像;Acquiring a reference image corresponding to the label to be deleted;
将所述参考图像和所述视频帧序列中的视频帧进行匹配,得到匹配结果;matching the reference image with the video frames in the sequence of video frames to obtain a matching result;
根据所述匹配结果,确定裁剪起点帧和所述裁剪起点帧对应的裁剪终点帧;所述裁剪起点帧至所述裁剪终点帧之间的视频帧构成所述待删除片段。According to the matching result, determine the trimming start frame and the trimming end frame corresponding to the trimming start frame; the video frames between the trimming start frame and the trimming end frame constitute the segment to be deleted.
在一些实施例中,识别单元1202,还配置为:In some embodiments, the identifying unit 1202 is further configured to:
对于所述视频帧序列中的每一所述视频帧,确定所述视频帧中所包括内容对象的参考区域,并从所述视频帧中裁剪出所述参考区域,得到待匹配图像;For each video frame in the video frame sequence, determine a reference area of a content object included in the video frame, and cut out the reference area from the video frame to obtain an image to be matched;
确定所述待匹配图像和所述参考图像之间的内容相似度;determining the content similarity between the image to be matched and the reference image;
将内容相似度大于设定相似度阈值的匹配图像所属的视频帧确定为第一视频帧;所述第一视频帧的内容包括所述待删除标签对应的对象;Determining the video frame to which the matching image whose content similarity is greater than the set similarity threshold belongs is the first video frame; the content of the first video frame includes the object corresponding to the label to be deleted;
将所述相似度小于或等于相似度阈值的匹配图像所属的视频帧确定为第二视频帧,所述第二视频帧的内容不包括所述待删除标签对应的对象。The video frame to which the matching image whose similarity is less than or equal to the similarity threshold belongs is determined as a second video frame, and the content of the second video frame does not include the object corresponding to the tag to be deleted.
在一些实施例中,识别单元1202,还配置为:In some embodiments, the identifying unit 1202 is further configured to:
确定相邻的前一帧属于所述第二视频帧但自身属于所述第一视频帧的视频帧确定为所述裁剪起点帧;所述第一视频帧的内容包括所述待删除标签对应的对象;所述第二视频帧的内容不包括所述待删除标签对应的对象;It is determined that the adjacent previous frame belongs to the second video frame but the video frame that itself belongs to the first video frame is determined as the cropping start frame; the content of the first video frame includes the corresponding Object; the content of the second video frame does not include the object corresponding to the label to be deleted;
将相邻的后一帧属于所述第二视频帧但自身属于所述第一视频帧的视频帧确定为所述裁剪终点帧;所述裁剪起点帧和所述裁剪终点帧之间不包括属于所述第二视频帧的视频帧。Determine the video frame whose next adjacent frame belongs to the second video frame but itself belongs to the first video frame as the clipping end frame; the clipping start frame and the clipping end frame do not include A video frame of the second video frame.
在一些实施例中,裁剪单元1203,还配置为:In some embodiments, the cropping unit 1203 is further configured to:
基于所述裁剪起点帧的前一帧视频帧和所述裁剪终点帧的后一帧视频帧的拼接,对所述待删除片段之前的片段和所述待删除片段之后的片段进行合并,得到所述目标多媒体数据。Based on the splicing of the previous video frame of the clipping start frame and the subsequent video frame of the clipping end frame, the segment before the segment to be deleted and the segment after the segment to be deleted are merged to obtain the Describe the target multimedia data.
在一些实施例中,装置1200还包括:替换单元,配置为:In some embodiments, the device 1200 further includes: a replacement unit configured to:
获取所述目标多媒体数据对应的待处理多媒体数据的存储路径;Acquiring a storage path of the multimedia data to be processed corresponding to the target multimedia data;
将所述目标多媒体数据存储至所述存储路径,以替换所述目标多媒体数据对应的待处理多媒体数据。storing the target multimedia data in the storage path to replace unprocessed multimedia data corresponding to the target multimedia data.
需要说明的是,本申请实施例提供的多媒体数据处理装置所包括的各逻辑单元,可以通过电子设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU,Central Processing Unit)、微处理器(MPU,Micro Processor Unit)、数字信号处理器(DSP,Digital Signal Processor)或现场可编程门阵列(FPGA,Field-Programmable Gate Array)等。It should be noted that each logic unit included in the multimedia data processing device provided in the embodiment of the present application can be realized by a processor in an electronic device; of course, it can also be realized by a specific logic circuit; in the process of implementation, the processing The processor can be a central processing unit (CPU, Central Processing Unit), a microprocessor (MPU, Micro Processor Unit), a digital signal processor (DSP, Digital Signal Processor) or a field programmable gate array (FPGA, Field-Programmable Gate Array )Wait.
以上系统实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请系统实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。The description of the above system embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the system embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的多媒体数据处理方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)运行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiment of the present application, if the above multimedia data processing method is implemented in the form of software function modules and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) runs all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
本申请实施例还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器运行所述计算机程序时实现上述多媒体数据处理方法中的步骤。An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the above-mentioned multimedia data processing method when running the computer program. step.
对应地,本申请实施例提供一种存储介质,也就是计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时实现上述实施例中提供的多媒体数据处理方法。Correspondingly, the embodiments of the present application provide a storage medium, that is, a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the multimedia data processing method provided in the foregoing embodiments is implemented.
这里需要指出的是:以上存储介质实施例的描述,与上述方法实施例的描述是类似的,具有同 方法实施例相似的有益效果。对于本申请存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。It should be pointed out here that: the description of the above storage medium embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the storage medium embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
需要说明的是,图13为本申请实施例电子设备的一种硬件实体示意图,如图13所示,所述电子设备1300包括:一个处理器1301、至少一个通信总线1302、至少一个外部通信接口1304和存储器1305。其中,通信总线1302配置为实现这些组件之间的连接通信。在一示例中,电子设备1300还包括:用户接口1303、其中,用户接口1303可以包括显示屏,外部通信接口1304可以包括标准的有线接口和无线接口。It should be noted that FIG. 13 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application. As shown in FIG. 13 , the electronic device 1300 includes: a processor 1301, at least one communication bus 1302, and at least one external communication interface 1304 and memory 1305. Wherein, the communication bus 1302 is configured to realize connection and communication between these components. In an example, the electronic device 1300 further includes: a user interface 1303, wherein the user interface 1303 may include a display screen, and the external communication interface 1304 may include a standard wired interface and a wireless interface.
存储器1305配置为存储由处理器1301可运行的指令和应用,还可以缓存待处理器1301以及电子设备中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。The memory 1305 is configured to store instructions and applications executable by the processor 1301, and can also cache data to be processed or processed by the processor 1301 and various modules in the electronic device (for example, image data, audio data, voice communication data and video data) Communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。It should be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in some embodiments" throughout this specification are not necessarily referring to the same embodiments. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation. The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关 的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes The steps of the foregoing method embodiments; and the foregoing storage media include: removable storage devices, read-only memory (Read Only Memory, ROM), magnetic disks or optical disks and other media that can store program codes.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only the embodiment of the present application, but the scope of protection of the present application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, and should covered within the scope of protection of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (11)

  1. 一种多媒体数据处理方法,所述方法包括:A multimedia data processing method, the method comprising:
    基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;Determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
    对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;Identifying the content of the multimedia data to be processed, and determining the segment to be deleted in the multimedia data to be processed, the content of the segment to be deleted includes the object corresponding to the label to be deleted;
    将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据。Cutting out the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
  2. 根据权利要求1所述的方法,其中,所述基于待删除标签,确定待处理多媒体数据,包括:The method according to claim 1, wherein said determining the multimedia data to be processed based on the label to be deleted comprises:
    获取至少两个原始多媒体数据中每一所述原始多媒体数据的标签;Obtaining tags of each of the at least two original multimedia data;
    将所述每一原始多媒体数据的标签和所述待删除标签进行比较;comparing the label of each original multimedia data with the label to be deleted;
    将所述标签包括所述待删除标签的原始多媒体数据,确定为所述待处理多媒体数据。Determining the original multimedia data whose tag includes the tag to be deleted as the multimedia data to be processed.
  3. 根据权利要求2所述的方法,其中,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    获取所述原始多媒体数据所属的账号下的至少一个商品链接;Obtain at least one commodity link under the account to which the original multimedia data belongs;
    确定所述至少一个商品链接中每一所述商品链接针对的商品信息,为所述原始视频数据的标签。The commodity information targeted by each commodity link in the at least one commodity link is determined as a tag of the original video data.
  4. 根据权利要求1至3中任一项所述的方法,其中,所述待处理多媒体数据的文件类型包括:视频,所述对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,包括:The method according to any one of claims 1 to 3, wherein the file type of the multimedia data to be processed comprises: video, identifying the content of the multimedia data to be processed, determining the multimedia data to be processed Segments to be deleted in the data, including:
    提取所述待处理多媒体数据的视频帧序列;Extracting the video frame sequence of the multimedia data to be processed;
    获取所述待删除标签对应的参考图像;Acquiring a reference image corresponding to the label to be deleted;
    将所述参考图像和所述视频帧序列中的视频帧进行匹配,得到匹配结果;matching the reference image with the video frames in the sequence of video frames to obtain a matching result;
    根据所述匹配结果,确定裁剪起点帧和所述裁剪起点帧对应的裁剪终点帧;所述裁剪起点帧至所述裁剪终点帧之间的视频帧构成所述待删除片段。According to the matching result, determine the trimming start frame and the trimming end frame corresponding to the trimming start frame; the video frames between the trimming start frame and the trimming end frame constitute the segment to be deleted.
  5. 根据权利要求4所述的方法,其中,所述将所述参考图像和所述视频帧序列中的视频帧进行匹配,得到匹配结果,包括:The method according to claim 4, wherein said matching the reference image with the video frames in the sequence of video frames to obtain a matching result comprises:
    对于所述视频帧序列中的每一所述视频帧,确定所述视频帧中所包括内容对象的参考区域,并从所述视频帧中裁剪出所述参考区域,得到待匹配图像;For each video frame in the video frame sequence, determine a reference area of a content object included in the video frame, and cut out the reference area from the video frame to obtain an image to be matched;
    确定所述待匹配图像和所述参考图像之间的内容相似度;determining the content similarity between the image to be matched and the reference image;
    将内容相似度大于设定相似度阈值的匹配图像所属的视频帧确定为第一视频帧;所述第一视频帧的内容包括所述待删除标签对应的对象;Determining the video frame to which the matching image whose content similarity is greater than the set similarity threshold belongs is the first video frame; the content of the first video frame includes the object corresponding to the label to be deleted;
    将所述相似度小于或等于相似度阈值的匹配图像所属的视频帧确定为第二视频帧,所述第二视频帧的内容不包括所述待删除标签对应的对象。The video frame to which the matching image whose similarity is less than or equal to the similarity threshold belongs is determined as a second video frame, and the content of the second video frame does not include the object corresponding to the tag to be deleted.
  6. 根据权利要求4或5所述的方法,其中,根据所述匹配结果,确定裁剪起点和所述裁剪起点对应的裁剪终点,包括:The method according to claim 4 or 5, wherein, according to the matching result, determining the clipping starting point and the clipping end point corresponding to the clipping starting point includes:
    确定相邻的前一帧属于所述第二视频帧但自身属于所述第一视频帧的视频帧确定为所述裁剪起点帧;所述第一视频帧的内容包括所述待删除标签对应的对象;所述第二视频帧的内容不包括所述待删除标签对应的对象;It is determined that the adjacent previous frame belongs to the second video frame but the video frame that itself belongs to the first video frame is determined as the cropping start frame; the content of the first video frame includes the corresponding Object; the content of the second video frame does not include the object corresponding to the label to be deleted;
    将相邻的后一帧属于所述第二视频帧但自身属于所述第一视频帧的视频帧确定为所述裁剪终点帧;所述裁剪起点帧和所述裁剪终点帧之间不包括属于所述第二视频帧的视频帧。Determine the video frame whose next adjacent frame belongs to the second video frame but itself belongs to the first video frame as the clipping end frame; the clipping start frame and the clipping end frame do not include A video frame of the second video frame.
  7. 根据权利要求4所述的方法,其中,所述将所述待删除片段,从所述删除片段所属的待处理多媒体数据中裁剪掉,得到目标多媒体数据,包括:The method according to claim 4, wherein said clipping the segment to be deleted from the multimedia data to be processed to which the deleted segment belongs to obtain the target multimedia data comprises:
    基于所述裁剪起点帧的前一帧视频帧和所述裁剪终点帧的后一帧视频帧的拼接,对所述待删除片段之前的片段和所述待删除片段之后的片段进行合并,得到所述目标多媒体数据。Based on the splicing of the previous video frame of the clipping start frame and the subsequent video frame of the clipping end frame, the segment before the segment to be deleted and the segment after the segment to be deleted are merged to obtain the Describe the target multimedia data.
  8. 根据权利要求1至7中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:
    获取所述目标多媒体数据对应的待处理多媒体数据的存储路径;Acquiring a storage path of the multimedia data to be processed corresponding to the target multimedia data;
    将所述目标多媒体数据存储至所述存储路径,以替换所述目标多媒体数据对应的待处理多媒体数据。storing the target multimedia data in the storage path to replace unprocessed multimedia data corresponding to the target multimedia data.
  9. 一种多媒体数据处理装置,所述装置包括:A multimedia data processing device, said device comprising:
    确定单元,配置为基于待删除标签,确定待处理多媒体数据;所述待处理多媒体数据的标签包括所述待删除标签;The determining unit is configured to determine the multimedia data to be processed based on the label to be deleted; the label of the multimedia data to be processed includes the label to be deleted;
    识别单元,配置为对所述待处理多媒体数据的内容进行识别,确定所述待处理多媒体数据中的待删除片段,所述待删除片段的内容包括所述待删除标签对应的对象;The identification unit is configured to identify the content of the multimedia data to be processed, and determine the segment to be deleted in the multimedia data to be processed, and the content of the segment to be deleted includes the object corresponding to the label to be deleted;
    裁剪单元,配置为将所述待删除片段,从所述待处理多媒体数据中裁剪掉,得到目标多媒体数据。The clipping unit is configured to clip the segment to be deleted from the multimedia data to be processed to obtain target multimedia data.
  10. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器运行所述计算机程序时实现权利要求1至8任一项所述多媒体数据处理方法中的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor runs the computer program, the multimedia data processing according to any one of claims 1 to 8 is realized steps in the method.
  11. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时,实现权利要求1至8任一项所述的多媒体数据处理方法。A computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the multimedia data processing method described in any one of claims 1 to 8 is realized.
PCT/CN2022/094878 2021-05-25 2022-05-25 Multimedia data processing method and apparatus, and device and storage medium WO2022247849A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110569790.5A CN113377713A (en) 2021-05-25 2021-05-25 Multimedia data processing method and device, equipment and storage medium
CN202110569790.5 2021-05-25

Publications (1)

Publication Number Publication Date
WO2022247849A1 true WO2022247849A1 (en) 2022-12-01

Family

ID=77571793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094878 WO2022247849A1 (en) 2021-05-25 2022-05-25 Multimedia data processing method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN113377713A (en)
WO (1) WO2022247849A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377713A (en) * 2021-05-25 2021-09-10 北京沃东天骏信息技术有限公司 Multimedia data processing method and device, equipment and storage medium
CN114157881A (en) * 2021-10-29 2022-03-08 北京达佳互联信息技术有限公司 Multimedia processing method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110121116A (en) * 2018-02-06 2019-08-13 上海全土豆文化传播有限公司 Video generation method and device
CN110121106A (en) * 2018-02-06 2019-08-13 优酷网络技术(北京)有限公司 Video broadcasting method and device
US10565530B1 (en) * 2014-09-29 2020-02-18 Amazon Technologies, Inc. Viewing segments of event media
CN113377713A (en) * 2021-05-25 2021-09-10 北京沃东天骏信息技术有限公司 Multimedia data processing method and device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565530B1 (en) * 2014-09-29 2020-02-18 Amazon Technologies, Inc. Viewing segments of event media
CN110121116A (en) * 2018-02-06 2019-08-13 上海全土豆文化传播有限公司 Video generation method and device
CN110121106A (en) * 2018-02-06 2019-08-13 优酷网络技术(北京)有限公司 Video broadcasting method and device
CN113377713A (en) * 2021-05-25 2021-09-10 北京沃东天骏信息技术有限公司 Multimedia data processing method and device, equipment and storage medium

Also Published As

Publication number Publication date
CN113377713A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US11132555B2 (en) Video detection method, server and storage medium
WO2022247849A1 (en) Multimedia data processing method and apparatus, and device and storage medium
US8503523B2 (en) Forming a representation of a video item and use thereof
CN108353208B (en) Optimizing media fingerprint retention to improve system resource utilization
CN110134829B (en) Video positioning method and device, storage medium and electronic device
US20220172476A1 (en) Video similarity detection method, apparatus, and device
CN111866585A (en) Video processing method and device
CN113613065B (en) Video editing method and device, electronic equipment and storage medium
WO2019042341A1 (en) Video editing method and device
JP2018530080A (en) System and method for partitioning search indexes for improved media segment identification efficiency
KR101832680B1 (en) Searching for events by attendants
CN113392236A (en) Data classification method, computer equipment and readable storage medium
CN109168020A (en) Method for processing video frequency, device, calculating equipment and storage medium based on live streaming
US11849241B2 (en) Dynamically configured processing of a region of interest dependent upon published video data selected by a runtime configuration file
CN114625918A (en) Video recommendation method, device, equipment, storage medium and program product
CN114390368B (en) Live video data processing method and device, equipment and readable medium
CN109241344B (en) Method and apparatus for processing information
KR20200115017A (en) Apparatus and method for searching image
CN107369450B (en) Recording method and recording apparatus
US20240040108A1 (en) Method and system for preprocessing optimization of streaming video data
CN111444364B (en) Image detection method and device
CN112101197A (en) Face information acquisition method and device
CN115379233B (en) Big data video information analysis method and system
JP2019212068A (en) Information processing apparatus, information processing method, and program
US11961273B2 (en) Dynamically configured extraction, preprocessing, and publishing of a region of interest that is a subset of streaming video data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22810573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 27/03/2024)