WO2023185257A1 - Data processing method, and device and computer-readable storage medium - Google Patents

Data processing method, and device and computer-readable storage medium Download PDF

Info

Publication number
WO2023185257A1
WO2023185257A1 PCT/CN2023/074763 CN2023074763W WO2023185257A1 WO 2023185257 A1 WO2023185257 A1 WO 2023185257A1 CN 2023074763 W CN2023074763 W CN 2023074763W WO 2023185257 A1 WO2023185257 A1 WO 2023185257A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
text
quality
shared
candidate
Prior art date
Application number
PCT/CN2023/074763
Other languages
French (fr)
Chinese (zh)
Inventor
陈小帅
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023185257A1 publication Critical patent/WO2023185257A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of Internet technology, and in particular, to a data processing method, equipment and computer-readable storage medium.
  • Computer vision technology (Computer Vision, CV) is a science that studies how to make machines "see”. Furthermore, it refers to using cameras and computers to replace human eyes to identify and measure targets and other machine vision, and further to do graphics. Processing, so that computer processing becomes an image more suitable for human eye observation or transmitted to instrument detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction, autonomous driving, smart transportation and other technologies, as well as common biometric recognition technologies such as face recognition and fingerprint recognition.
  • OCR optical character recognition
  • OCR optical Character Recognition
  • video processing video semantic understanding
  • video content/behavior recognition three-dimensional object reconstruction
  • 3D technology virtual Reality, augmented reality, simultaneous positioning and map construction
  • autonomous driving smart transportation and other technologies
  • biometric recognition technologies such as face recognition and fingerprint recognition.
  • Video sharing means that the browsing object corresponding to the video shares the video with other browsing objects when browsing the video in the video application. Video sharing is a main way for the video browsing objects to communicate, and the object activity and playback status of the video application are affected. Greater impact.
  • Embodiments of the present application provide a data processing method, equipment, and computer-readable storage media, which can save network transmission resources and processing resources of shared data receiving devices on the premise of improving video sharing efficiency and sharing effects.
  • embodiments of the present application provide a data processing method, which is executed by a computer device, including:
  • Obtain at least two video segments in the video determine first sharing qualities corresponding to the at least two video segments, and select at least one video segment from the at least two video segments as a candidate video segment based on the first sharing quality;
  • the object tag text sequence includes sharing the video
  • the object tag text of the browsing object is used to characterize the interest of the browsing object, and the object tag text of the shared object is used to characterize the shared object interest of;
  • the second sharing quality is used to characterize the correlation between the candidate video clip and the object label text of the shared object;
  • the third sharing quality corresponding to each candidate shared video clip and the auxiliary description information corresponding to each candidate shared video clip are determined; the third sharing quality is used to characterize the auxiliary description information Matching degree with the candidate shared video clip and the object tag text of the shared object;
  • the shared video segment is determined from the candidate shared video segments, and the shared video segment and the auxiliary description information corresponding to the shared video segment are determined as For shared data sent to a shared object.
  • a data processing device including:
  • the first acquisition module is used to acquire at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and select at least one video from the at least two video clips according to the first sharing quality. clips as candidate video clips;
  • the second acquisition module is used to obtain the object tag text sequence associated with the video, the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the sharing; the browsing The object label text of the object is used to represent the interest of the browsing object, and the object label text of the shared object is used to represent the interest of the shared object; according to the object label text sequence and the candidate video clips, determine the corresponding content of each candidate video clip.
  • the second sharing quality is to select at least one candidate video segment from the candidate video segments as the candidate shared video segment according to the second sharing quality corresponding to each candidate video segment; the second sharing quality is used to characterize the relationship between the candidate video segment and the candidate video segment. Describes the relevance of the object label text of the shared object;
  • the first determination module is used to determine the third sharing quality corresponding to each candidate shared video clip and the auxiliary description information corresponding to each candidate shared video clip according to the object label text sequence and the candidate shared video clip; the third sharing quality is used To characterize the matching degree between the auxiliary description information and the candidate shared video clip and the object tag text of the shared object;
  • the second determination module is configured to determine the shared video segments from the candidate shared video segments according to the first sharing quality, the second sharing quality and the third sharing quality corresponding to each candidate shared video segment, and associate the shared video segments with the shared video segments.
  • the auxiliary description information is determined as shared data sent to the shared object.
  • An embodiment of the present application also provides a computer device, including: a processor, a memory, and a network interface;
  • the above-mentioned processor is connected to the above-mentioned memory and the above-mentioned network interface, wherein the above-mentioned network interface is used to provide data communication functions, the above-mentioned memory is used to store computer programs, and the above-mentioned processor is used to call the above-mentioned computer programs to cause the computer device to execute the embodiments of the present application. method in.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is suitable for being loaded by a processor and executing the method in the embodiment of the present application.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product includes a computer program, and the computer program is stored in a computer-readable storage medium; the processor of the computer device reads the computer program from the computer-readable storage medium, The processor executes the computer program, so that the computer device executes the method in the embodiment of the present application.
  • Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a data processing scenario provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of a data processing method provided by an embodiment of the present application.
  • Figure 4 is a schematic model structure diagram of a first video recognition sub-model provided by an embodiment of the present application.
  • Figure 5 is a schematic model structure diagram of a second video recognition sub-model provided by an embodiment of the present application.
  • Figure 6 is another schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Figure 7 is a schematic model structure diagram of a fourth video recognition sub-model provided by an embodiment of the present application.
  • Figure 8 is a schematic model structure diagram of a fifth video recognition sub-model provided by an embodiment of the present application.
  • Figure 9 is another schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 11 is another structural schematic diagram of a data processing device provided by an embodiment of the present application.
  • Figure 12 is another structural schematic diagram of a data processing device provided by an embodiment of the present application.
  • Figure 13 is another structural schematic diagram of a data processing device provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the video sharing process is to share the entire video content with friends, and the auxiliary description information carried is information built in advance by the operating platform corresponding to the video application. Obviously, sharing the entire video will occupy too many network resources. , thereby reducing the sharing efficiency of the video; and because the same auxiliary description information is shared to different objects, the sharing display method will be too single and the sharing effect will be reduced.
  • the computer device determines the first sharing quality corresponding to at least two video clips in the video. Therefore, candidate video clips can be determined from the at least two video clips based on the first sharing quality. It can be understood that , the candidate video clip belongs to the video and its shared value (quality) is better than the shared value of the video; further, the computer device obtains the object label text sequence associated with the video, and determines the candidate video clip corresponding to the object label text sequence and the candidate video clip. The second sharing quality, therefore, the candidate shared video clip can be determined from the candidate video clips according to the second sharing quality corresponding to the candidate video clip.
  • the candidate shared video clip is not only determined based on the video content of the candidate video clip, but also It is determined based on the object label text sequence, so its shared value (quality) is better than the shared value of the candidate video clip; further, the computer device determines the third sharing quality corresponding to the candidate shared video clip based on the object label text sequence and the candidate shared video clip, According to the third sharing quality corresponding to the candidate shared video clip, the auxiliary description information corresponding to the candidate shared video clip is determined.
  • the auxiliary description information is not only associated with the candidate shared video clip, but also associated with the object label text sequence; further , the computer device determines the shared video segment from the candidate shared video segment according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video segment, and uses the shared video segment and the auxiliary description information corresponding to the shared video segment , identified as shared data to be sent to the shared object.
  • the shared data in this application is determined based on the sharing quality of different dimensions. It is not only associated with the video content of the shared video clip itself, but also associated with the object label text sequence. Therefore, by sharing the data, the quality of the video can be improved. Sharing efficiency and sharing effects. Moreover, since video clips are shared rather than the entire video, network transmission resources and processing resources of the receiving device sharing the data can be saved.
  • Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system may include a business server 100 and a terminal device cluster.
  • the terminal device cluster may include: terminal device 200a, terminal device 200b, terminal device 200c,..., terminal device 200n. It can be understood that the above system can Including one or more terminal devices, this application does not limit the number of terminal devices.
  • any terminal device in the terminal device cluster may have a communication connection with the service server 100.
  • the above-mentioned communication connection is not limited to a connection method and can be carried out through wired communication. Directly or indirectly, it can also be connected directly or indirectly through wireless communication, or it can also be connected through other The method is not limited in this application.
  • each terminal device in the terminal device cluster as shown in Figure 1 can be installed with an application client.
  • the application client When the application client is running in each terminal device, it can be connected to the above-mentioned Figure 1 through the above communication connection.
  • the business server 100 shown performs data exchange.
  • the application client can be a video application, a live broadcast application, a social networking application, an instant messaging application, a game application, a music application, a shopping application, a novel application, a browser, and other application clients with a video loading function.
  • the application client can be an independent client, or it can be an embedded sub-client integrated in a certain client (for example, a social client, an education client, a multimedia client, etc.), and there is no limitation here. .
  • the business server 100 can be a collection of multiple servers including a background server corresponding to the video application, a data processing server, etc. Therefore, each terminal device can communicate with the business server 100 through the application client corresponding to the video application. For data transmission, for example, each terminal device can upload its local video to the business server 100 through the application client of the video application, and then the business server 100 can deliver the video to other terminal devices or transmit it to the cloud server.
  • one terminal device can be selected as the target terminal device in the terminal device cluster shown in FIG. 1 , for example, terminal device 200a is used as the target terminal device.
  • the terminal device 200a may send the video identification, the browsing object identification, and the sharing object identification as data to be identified to the service server 100
  • the embodiment of the present application refers to the user using the terminal device 200a as a browsing object, and the users (such as friend users) who are associated with the browsing object are called shared objects.
  • the embodiment of the present application does not identify the browsing object (the browsing object has been authorization), including but not limited to the mobile phone and identification number bound to the browsing object in the application client, which can be set according to the actual application scenario.
  • the shared object identification can be any Information that can be used to identify the video in the application client.
  • the service server 100 can obtain the video according to the video identification, and obtain the object tag text sequence according to the browsing object identification and the shared object identification.
  • the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; the object tag text of the browsing object is used to represent the interest of the browsing object, and the sharing The object tag text of an object is used to characterize the interest of the shared object.
  • the service server 100 obtains at least two video clips in the video, and the service server 100 obtains a trained video recognition model.
  • the video recognition model may include a first Video recognition sub-model, second video recognition sub-model and third video recognition sub-model; through the first video recognition sub-model, the service server 100 can determine the first sharing quality respectively corresponding to at least two video clips, according to the first Sharing quality, candidate video segments can be determined from at least two video segments; further, in the second video recognition sub-model, the service server 100 can determine the third video segment corresponding to each candidate video segment according to the object label text sequence and the candidate video segments.
  • Second sharing quality according to the second sharing quality corresponding to each candidate video segment, candidate shared video segments can be determined from the candidate video segments; further, in the third video recognition sub-model, according to the object label text sequence and candidate shared video segments , the service server 100 can determine the third sharing quality corresponding to each candidate shared video segment, and the auxiliary description information corresponding to each candidate shared video segment; further, according to the first sharing quality, the second sharing quality corresponding to each candidate shared video segment Sharing quality, and the third sharing quality, the service server 100 can determine the shared video clip from the candidate shared video clips, and determine the shared video clip and the auxiliary description information corresponding to the shared video clip as shared data for sending to the sharing object.
  • the business server 100 sends the shared data to the terminal device 200a.
  • the terminal device 200a After receiving the shared data sent by the business server 100, the terminal device 200a can display the shared data on its corresponding screen. Furthermore, the terminal device 200a can carry the video identification
  • the shared data is sent to the terminal device corresponding to the sharing object (for example, the terminal device 200b in Figure 1). After the terminal device 200b obtains the shared data carrying the video identifier, it can display the shared data on its screen. Furthermore, the sharing object can view the complete video based on the video identifier carried by the shared data.
  • the service server 100 can send the shared data to the terminal device corresponding to the shared object (the terminal device 200b in Figure 1). , please refer to the above description for the subsequent process and will not be repeated here.
  • the service server 100 generates a sharing identifier for the shared video clip, and sends the sharing identifier and auxiliary description information to the terminal device 200a. Then, after the terminal device 200a obtains the sharing identifier, it can generate a sharing identifier for the video carrying the sharing identifier. and the sharing information of the auxiliary description information. Furthermore, the terminal device 200a sends the sharing information to the terminal device 200b corresponding to the sharing object. When the terminal device 200b obtains the sharing information, it can play the shared video clip in the video according to the sharing identification.
  • the browsing object authorizes the service server 100 to have sharing permissions, then after the service server 100 generates the sharing identifier, it can send the sharing identifier and auxiliary description information to the terminal device 200b.
  • the service server 100 can send the sharing identifier and auxiliary description information to the terminal device 200b.
  • the terminal device 200a can use the video recognition model to determine the first sharing quality corresponding to at least two video clips in the video, so it can obtain from at least Determine the candidate video clip from the two video clips; according to the object label text sequence and the candidate video clip, the terminal device 200a can determine the second sharing quality corresponding to the candidate video clip, and then determine the candidate shared video clip from the candidate video clip; according to the object tag text sequence and candidate shared video clips, the terminal device 200a can Determine the third sharing quality corresponding to the candidate shared video clip, and the auxiliary description information corresponding to the candidate shared video clip; according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video clip, the terminal device 200a can The shared video clip is determined from the candidate shared video clips, so the shared video clip and the auxiliary description information corresponding to the shared video clip can be determined as shared data for sending to the sharing object.
  • the local video recognition model of the terminal device 200a can be sent to the terminal device 200a after the training is completed by the service server 100.
  • the shared data in the embodiment of the present application is automatically constructed based on the video and the object tag text sequence, and has high sharing value. Therefore, the shared video clips can intuitively reflect the beautiful content of the video, and at the same time, share with the browsing object/ The object's interest tags match, so the sharing efficiency and effect of the video can be improved.
  • terminal equipment 200a can all be blockchain nodes in the blockchain network, and the data described in the full text (such as object tags) Text sequences and shared data) can be stored.
  • the storage method can be that the blockchain node generates blocks based on the data and adds the blocks to the blockchain for storage.
  • Blockchain is a new application model of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism and encryption algorithm. It is mainly used to organize data in chronological order and encrypt it into a ledger, making it impossible to be tampered with and forged. , and data verification, storage and update can be performed at the same time.
  • Blockchain is essentially a decentralized database. Each node in the database stores an identical blockchain.
  • the blockchain network can distinguish nodes into core nodes, data nodes and light nodes. Core nodes, data nodes and light nodes together form a blockchain node.
  • the core node is responsible for the consensus of the entire blockchain network, which means that the core node is the consensus node in the blockchain network.
  • the process for the transaction data in the blockchain network to be written into the ledger can be as follows: the data node or light node in the blockchain network obtains the transaction data and transmits the transaction data in the blockchain network (that is, the node passes the baton until the consensus node receives the transaction data, the consensus node then packages the transaction data into a block, performs consensus on the block, and writes the transaction data into the ledger after the consensus is completed.
  • object tag text sequence and shared data are used as examples of transaction data.
  • the business server 100 After passing the consensus on the transaction data, the business server 100 (blockchain node) generates blocks based on the transaction data and stores the blocks in the blockchain network; and For reading transaction data (i.e., object tag text sequence and shared data), the blockchain node can obtain the block containing the transaction data in the blockchain network, and further obtain the transaction data in the block. .
  • transaction data i.e., object tag text sequence and shared data
  • the methods provided by the embodiments of the present application can be executed by computer equipment, including but not limited to terminal equipment or business servers.
  • the business server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud Communication, middleware services, domain name services, security services, CDN, etc. and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • Terminal devices include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, aircraft, etc.
  • the terminal device and the service server may be connected directly or indirectly through wired or wireless methods, and the embodiments of the present application are not limited here.
  • FIG. 2 is a schematic diagram of a data processing scenario provided by an embodiment of the present application.
  • Embodiments of this application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, assisted driving, etc.
  • the embodiments of the present application can be applied to business scenarios such as video clip recommendation scenarios, video clip distribution scenarios, and video clip search scenarios. Specific business scenarios will not be listed here.
  • the implementation process of this data processing scenario can be carried out in the business server or in the terminal device. It can also be carried out interactively in the terminal device and the business server. There is no restriction here.
  • the embodiments of the present application are described by taking the interaction between a terminal device and a service server as an example.
  • the terminal device can be any terminal device in the terminal device cluster in the embodiment corresponding to Figure 1. 2
  • the service server may be the service server 100 in the embodiment corresponding to the above-mentioned FIG. 1 .
  • the browsing object 20b has a binding relationship with the terminal device 200a.
  • the terminal device 200a can display the basic information of the video 201a on the playback interface, such as the video duration (Fig. 2 example is 6 minutes), video cover (example in Figure 2 is a cat image 205a), video copy (example copy in Figure 2 is "kittens fighting for food" 206a).
  • the terminal device 200a can also display controls for the video 201a on the playback interface, such as the playback control 207a and the sharing control 202a illustrated in FIG. 2 .
  • the terminal device 200a When the browsing object 20b triggers the sharing control 202a, the terminal device 200a responds to the triggering operation on the sharing control 202a and displays the friend list of the browsing object 20b.
  • the example friend list in Figure 2 includes three friends, namely friend “aa” and friend “bb” and friend “cc”. If the browsing object 20b triggers the selection control 203a corresponding to the friend "cc", the terminal device 200a can display a prompt sub-page, and the prompt sub-page can display a "cancel control" and a "share” control 204a.
  • the terminal device 200a determines the friend "cc" as the sharing object.
  • the terminal device 200a can obtain the video identification corresponding to the video 201a, the browsing object identification corresponding to the browsing object 20b, and the sharing object identification corresponding to the shared object, and then send the video identification, browsing object identification, and sharing object identification to the service server 100, so as to
  • the business server 100 obtains the video 201a through the video identification, and determines the object tag text sequence through the browsing object identification and the shared object identification.
  • the object tag text sequence includes object tag text for the browse object 20b and object tag text for the shared object.
  • the object tag text of the browsing object 20b is used to characterize the interest of the browsing object 20b; the object tag text of the shared object is used to characterize the interest of the shared object.
  • the embodiment of the present application does not limit the way in which the service server 100 obtains the video 201a and the object label text sequence.
  • the video 201a and the object label text sequence can be obtained as described above, or the terminal device 200a can obtain the video 201a and the object label text sequence.
  • the business server 100 can also determine the video 201a and the object label text sequence through other methods. The specific settings should be based on the actual scenario.
  • the service server 100 can segment the video 201a through a time window to obtain at least two video clips 20d.
  • the length of the time window in the embodiment of this application is 1 minute.
  • the number of at least two video clips 20d is 6, such as the video clips 201d, 202d, 203d, 204d, 205d and 206d as shown in FIG. 2 .
  • the service server 100 obtains the trained video recognition model 20c.
  • the video recognition model 20c may include a first video recognition sub-model 20e, a second video recognition sub-model 20f and a third video recognition sub-model 20g.
  • the service server 100 inputs at least two video clips 20d to the first video recognition sub-model 20e respectively, and determines the first sharing quality corresponding to the at least two video clips 20d through the first video recognition sub-model 20e.
  • the first sharing quality is used to characterize the sharing value of the video clip.
  • the first sharing quality may be the interaction rate of the video clip.
  • the first shared quality of the video clip 201d is 0.8, the first shared quality of the video clip 202d is 0.85, the first shared quality of the video clip 203d is 0.89, and the first shared quality of the video clip 204d is 0.7, The first shared quality of the video clip 205d is 0.75, and the first shared quality of the video clip 206d is 0.9.
  • the specific process of the service server 100 determining the first shared quality corresponding to the video clip will not be described here. Please refer to the following.
  • Figure 3 corresponds to the description of step S101 in the embodiment.
  • the service server 100 obtains the first sharing quality threshold. It can be understood that the first sharing quality threshold can be adjusted according to the actual application scenario. An example in the embodiment of this application is 0.8.
  • the service server 100 compares the first shared quality of each video clip with the first shared quality threshold, and determines the video clip whose first shared quality is equal to or greater than the first shared quality threshold as a candidate video clip 201e, as shown in Figure 2 As shown, candidate video segment 201e includes video segments 201d, 202d, 203d, and 206d. Further, the business server 100 inputs both the object label text sequence and the candidate video segment 201e to the second video recognition sub-model 20f.
  • the second sharing quality corresponding to the candidate video segment 201e can be determined.
  • the second sharing quality is used to characterize the correlation between the candidate video clip and the object label text of the shared object.
  • the second shared quality of the video clip 201d is 0.74
  • the second shared quality of the video clip 202d is 0.86
  • the second shared quality of the video clip 203d is 0.8
  • the second shared quality of the video clip 206d is 0.9;
  • the specific process of the service server 100 determining the second sharing quality corresponding to the candidate video clip will not be described here. Please refer to the description of step S102 in the embodiment corresponding to FIG. 3 below.
  • the service server 100 obtains the second sharing quality threshold. It can be understood that the second sharing quality threshold can be based on Adjust the actual application scenario, and the example in the embodiment of this application is 0.85.
  • the service server 100 compares the four second shared qualities with the second shared quality threshold respectively, and determines the candidate video segment whose second shared quality is greater than the second shared quality threshold as the candidate shared video segment 201f, as shown in Figure 2 , the candidate shared video clip 201f includes video clips 202d and 206d. Further, the business server 100 inputs both the object label text sequence and the candidate shared video clip 201f to the third video recognition sub-model 20g.
  • the third sharing quality corresponding to the candidate shared video clip 201f can be determined, As shown in the example of Figure 2, the third sharing quality of the video clip 202d is 0.82, and the third sharing quality of the video clip 206d is 0.87; the specific process of the service server 100 determining the third sharing quality corresponding to the candidate shared video clip is here
  • the description will not be carried out for now, please refer to the description of step S103 in the embodiment corresponding to Figure 3 below.
  • the service server 100 can determine the auxiliary description information corresponding to the candidate shared video clip. As shown in Figure 2, the service server 100 determines the auxiliary description information 202g of the video clip 202d and determines the video clip 206d. The auxiliary description information 206g; wherein, the specific process of the service server 100 determining the auxiliary description information corresponding to the candidate shared video clip will not be described here. Please refer to the description of step S103 in the embodiment corresponding to Figure 3 below.
  • the service server 100 performs a weighted calculation on the first sharing quality (0.85 in the example of Figure 2), the second sharing quality (0.86 in the example of Figure 2), and the third sharing quality (0.82 in the example of Figure 2) corresponding to the video clip 202d.
  • the service server 100 can obtain the total shared quality corresponding to the video clip 206d by weighted summation of the quality (0.87 in the example of Figure 2); further, the service server 100 calculates the total shared quality corresponding to the video clip 202d and the total shared quality corresponding to the video clip 206d. By comparison, among the two total shared qualities, the maximum total shared quality is obtained.
  • the service server 100 can determine that the video clip 206d is shared.
  • Video clip further, the shared video clip (ie, the video clip 206d) and the auxiliary description information corresponding to the shared video clip (the auxiliary description information 206g as shown in FIG. 2) can be determined as the shared data 20h.
  • the business server 100 can synchronize the shared data 20h to the terminal device 200a, so the terminal device 200a can send the shared data 200a to the sharing object (the friend "cc" as shown in Figure 2).
  • this application can construct multiple video clips with high sharing value through deep modeling of videos.
  • auxiliary description information that is strongly related to the browsing objects and shared objects can be generated to achieve video sharing.
  • Figure 3 is a schematic flowchart 1 of a data processing method provided by an embodiment of the present application.
  • This data processing method can be executed by a business server (for example, the business server 100 shown in FIG. 1 above), or by a terminal device (for example, the terminal device 200a shown in FIG. 1 above), or by a business server and a terminal. Device interaction execution.
  • the embodiment of this application takes the method being executed by the service server as an example for description.
  • the data processing method may at least include the following steps S101 to S104.
  • Step S101 Obtain at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and select at least one video clip from the at least two video clips as a candidate video based on the first sharing quality. fragment.
  • the video can be segmented according to the time window to obtain at least two video clips corresponding to the video; the first sharing quality is used to characterize the popularity of the video clips, such as the interaction rate.
  • the popularity can be determined by the characteristics of the video clip in multiple dimensions, such as image characteristics, audio characteristics and text characteristics. For each video clip in the at least two video clips, perform the following operations to determine the first sharing quality corresponding to the video clip:
  • K is a positive integer
  • the first sharing quality corresponding to the video clip is determined.
  • the video clips can be subjected to audio recognition processing to obtain audio recognition text, such as voice dialogue text obtained through ASR recognition; the video clips can be subjected to text recognition processing, such as OCR processing, to obtain video description text (for example, subtitle text); You can get the barrage text corresponding to the video clip as the object comment text. .
  • the specific process of generating multi-dimensional fusion features corresponding to video clips may include: obtaining a video recognition model; the video recognition model includes a first video recognition sub-model; the first video recognition sub-model includes a video fusion network layer, an audio fusion network layer, Text fusion network layer and multi-dimensional fusion network layer; K video frames are input to the video fusion network layer respectively, through the video fusion network layer, feature extraction is performed on the K video frames respectively, and the corresponding features to be fused of the K video frames are obtained.
  • Video features perform feature fusion on the K video features to be fused, and obtain the video features corresponding to the video segments A b ; input the K audio frames to the audio fusion network layer respectively, and perform the feature fusion on the K audio frames through the audio fusion network layer.
  • Feature extraction obtain the audio features to be fused corresponding to the K audio frames, and for the K audio features to be fused Perform feature fusion to obtain the audio features corresponding to the video clip; determine the audio recognition text, the video description text and the object comment text as the content text corresponding to the video clip, and input the content text into the text fusion network layer, through the text fusion network layer, extract the key text in the content text, perform feature extraction on the key text, and obtain the text features corresponding to the key text; input the video features, audio features, and text features to the multi-dimensional fusion network layer respectively, and through the multi-dimensional fusion network layer
  • the dimensional fusion network layer performs feature fusion on video features, audio features and text features to obtain multi-dimensional fusion features corresponding to video clips.
  • the first video recognition sub-model further includes a first fully connected network layer.
  • the specific process of determining the first shared quality corresponding to at least two video clips may include: for each video clip: input the multi-dimensional fusion feature corresponding to the video clip into the first fully connected network layer , through the first fully connected network layer, feature transformation is performed on the multi-dimensional fusion features corresponding to the video clips, and the first shared quality corresponding to the video clips is obtained.
  • the specific process of selecting at least one video segment from at least two video segments as a candidate video segment may include: determining, among the at least two video segments, the video segment whose first shared quality is equal to or greater than the first shared quality threshold as the candidate video. fragment.
  • the business server can segment the video through the time window to obtain at least two video clips of the video, where the time window can be set according to the actual application scenario. It can be understood that the process of the service server determining the first sharing quality corresponding to each video clip is consistent. Therefore, the embodiment of the present application takes determining the first sharing quality corresponding to video clip A 1 as an example for description. At least two For the process of determining the first sharing quality corresponding to the remaining video clips in the video clip, please refer to the following description. Please also refer to Figure 4.
  • Figure 4 is a schematic model structure diagram of a first video recognition sub-model provided by an embodiment of the present application. As shown in Figure 4, the service server obtains K video frames from the video segment A 1 and the audio frames corresponding to the K video frames.
  • the K video frames can be randomly selected or based on cycles (for example, (one frame per second).
  • the embodiment of the present application does not limit the method of obtaining video frames, and can set it according to the actual application scenario; the business server performs audio recognition processing on video clip A1 , for example, through ASR technology, to obtain audio recognition Text; for example, through OCR technology, extract the video description text in the video clip A1 and extract the object comment text, where the video description text may include subtitle text and the object comment text may include barrage text; further, the business server will recognize the text with the audio , video description text and object comment text, determined to be the content text E 1 corresponding to the video clip A 1 .
  • the business server obtains the first video recognition sub-model in the video recognition model.
  • the first video recognition sub-model includes a video fusion network layer 40a, an audio fusion network layer 40b, a text fusion network layer 40c, and a multi-dimensional fusion network. layer 40e and the first fully connected network layer 40f.
  • the service server inputs the K video frames to the video fusion network layer 40a respectively. Assuming that the K video frames include the first video frame and the second video frame, then through the video fusion network layer 40a, feature extraction is performed on the first video frame.
  • the business server can obtain the video features to be fused corresponding to the K video frames respectively; perform feature fusion on the K video features 401a to be fused,
  • the service server can obtain the video feature 401d corresponding to the video clip A1 .
  • the video fusion network layer 40a can be regarded as a network for extracting deep features of K video frames.
  • the embodiment of the present application does not limit the network type of the video fusion network layer 40a, and it can be any one or more It consists of various neural networks, such as Convolutional Neural Networks (CNN), Residual Network (ResNet), High-Resolution Net (HRNet), and Standardized Convolutional Network Extension (EfficientNet) wait.
  • CNN Convolutional Neural Networks
  • ResNet Residual Network
  • HRNet High-Resolution Net
  • EfficientNet Standardized Convolutional Network Extension
  • the service server inputs K audio frames to the audio fusion network layer 40b respectively. Assuming that the K audio frames include the first audio frame corresponding to the first video frame and the second audio frame corresponding to the second video frame, then through the audio The fusion network layer 40b performs feature extraction on the first audio frame to obtain the first audio feature to be fused corresponding to the first audio frame, and performs feature extraction on the second audio frame to obtain the second audio feature to be fused corresponding to the second audio frame. Audio features, with this, the business server can obtain the audio features to be fused corresponding to the K audio frames, perform feature fusion on the K audio features 401b to be fused, and obtain the audio features 402d corresponding to the video segment A 1 .
  • the audio fusion network layer 40b can be regarded as a network used to extract deep features of K audio frames.
  • the embodiment of the present application does not limit the network type of the audio fusion network layer 40b. It can be any one or more It consists of several kinds of neural networks, such as convolution-time domain audio separation network (Conv-TasNet), bidirectional long short-term memory network and time domain audio separation network (BiLSTM-TasNet), Visual Geometry Group Network model (VGGish) based on tensorflow, etc.
  • Conv-TasNet convolution-time domain audio separation network
  • BiLSTM-TasNet bidirectional long short-term memory network and time domain audio separation network
  • VGish Visual Geometry Group Network model
  • the business server inputs the content text E 1 into the text fusion network layer 40c, extracts the key text in the content text E 1 through the text fusion network layer 40c, performs feature extraction on the key text, and obtains text features corresponding to the key text.
  • the embodiment of the present application does not limit the network type of the text fusion network layer 40c. It can be any natural language processing network, such as a deep self-attention transform network (Transformer, a deep self-attention transform network widely used in the fields of natural language translation and image processing). Learning model), Word2Vec (model used to generate word vectors), Bidirectional Encoder Representation from Transformers, Bert), etc.
  • the business server inputs the video feature 401d, the audio feature 402d, and the text feature 403d to the multi-dimensional fusion network layer 40e respectively.
  • the video feature 401d, the audio feature 402d, and the text feature 403d are feature fused.
  • the multi-dimensional fusion feature 401e corresponding to the video clip A 1 is obtained.
  • the business server inputs the multi-dimensional fusion feature 401e to the first fully connected network layer 40f, and performs feature transformation on the multi-dimensional fused feature 401e through the first fully connected network layer 40f to obtain the first shared quality corresponding to the video segment A1 .
  • the service server determines the specific process of candidate video clips from at least two video clips. Please refer to the description in Figure 2 above, which will not be described again this time.
  • Step S102 obtain the object label text sequence associated with the video, and Select video segments, determine the second sharing quality corresponding to each candidate video segment, and select at least one candidate video segment from the candidate video segments as the candidate shared video segment according to the second sharing quality corresponding to each candidate video segment.
  • the second sharing quality is used to characterize the correlation between the candidate video clip and the object label text of the shared object.
  • the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; the object tag text of the browsing object is used to represent the interest of the browsing object, and the sharing The object tag text of an object is used to characterize the interest of the shared object.
  • the object tag text of the browsing object associated with the video and obtain the object tag text of the shared object associated with the browsing object; generate an object tag text sequence based on the object tag text of the browsing object and the object tag text of the shared object; obtain the video
  • the recognition model inputs the object label text sequence and the candidate video clips to the video recognition model respectively;
  • the video recognition model includes a second video recognition sub-model;
  • the second video recognition sub-model includes a first text encoding network layer; through the first text encoding network layer, perform text encoding on each object label text in the object label text sequence, and obtain the first object label feature corresponding to the object label text sequence; obtain the multi-dimensional fusion features corresponding to each candidate video clip, and based on the first object label feature and
  • the multi-dimensional fusion features corresponding to each candidate video clip determine the second shared quality corresponding to each candidate video clip.
  • the second video recognition sub-model also includes a first splicing network layer and a second fully connected network layer;
  • the specific process of determining the second sharing quality corresponding to the candidate video clip may include: for each candidate video clip, the first object The label features and the multi-dimensional fusion features corresponding to the candidate video clips are respectively input to the first splicing network layer; through the first splicing network layer, feature splicing is performed on the first object label features and the multi-dimensional fusion features corresponding to the candidate video clips, Obtain the first multi-dimensional splicing feature corresponding to the candidate video clip; input the first multi-dimensional splicing feature to the second fully connected network layer, perform feature transformation on the first multi-dimensional splicing feature through the second fully connected network layer, and obtain The second shared quality corresponding to the candidate video clip;
  • Step S101 constructs candidate video clips with high interaction rate and high sharing value. This step limits the candidate video clips to the relevance of the subject's interests, so that the constructed candidate video clips are more consistent with the subject's interests, which can further improve the playback of video sharing. Transformation.
  • the business server obtains the object tag text of the browsing object (abbreviated as browsing object tag text).
  • the browsing object tag text can represent the browsing object's interest.
  • the tag text catalog, animation, pet
  • the shared object tag text can represent the interest of the shared object.
  • the tag text (cat, cartoon, children) represents Shared with people interested in cats, cartoons, and children's videos
  • the business server obtains the object tag text sequence, for example, combining the tag text (cat, animation, pet) and tag text (cat, cartoon, children) to obtain the tag text Sequences (cats, anime, pets, cartoons, children).
  • the object label text sequence is generated using the obtained object label text.
  • Embodiments of the present application can provide two ways of obtaining multi-dimensional fusion features corresponding to candidate video clips.
  • the first way: Step S101 has provided multi-dimensional fusion features corresponding to at least two video clips (including the multi-dimensional fusion features in Figure 4 Fusion features 401e), and the candidate video clips belong to at least two video clips, so the business server can obtain the candidate video clips corresponding to the multi-dimensional fusion features corresponding to the at least two video clips output in the first video recognition sub-model.
  • multi-dimensional fusion features please refer to Figure 2 again.
  • the business server can respectively obtain the multi-dimensional fusion features corresponding to the video clip 201d, the multi-dimensional fusion features corresponding to the video clip 202d, the multi-dimensional fusion features corresponding to the video clip 203d, and the video through the first video recognition sub-model 20e.
  • the multi-dimensional fusion feature corresponding to segment 204d, the multi-dimensional fusion feature corresponding to video segment 205d, and the multi-dimensional fusion feature corresponding to video segment 206d, and the business server determines that video segment 201d, video segment 202d, video segment 203d, and video segment 206d are candidate video clips, you can directly combine the multi-dimensional fusion features for the video clip 201d, the multi-dimensional fusion features for the video clip 202d, the multi-dimensional fusion features for the video clip 203d, and the multi-dimensional fusion features for the video clip output by the first video recognition sub-model 20e.
  • the multi-dimensional fusion feature of segment 206d is determined as the multi-dimensional fusion feature corresponding to the candidate video segment.
  • Figure 5 is a model structure of a second video recognition sub-model provided by an embodiment of the present application. Schematic diagram.
  • the model structure in the dotted area is the same as the model structure in the first video recognition sub-model of Figure 4, but the model parameters between the two are inconsistent because when training the second video recognition sub-model, the business server
  • the model parameters corresponding to the video fusion network layer 40a, audio fusion network layer 40b, text fusion network layer 40c and multi-dimensional fusion network layer 40e in the trained first video recognition sub-model are used as the dotted line area in Figure 5
  • step S101 the process in which the service server obtains the multi-dimensional fusion features 402e corresponding to the candidate video clips through the dotted line area in Figure 5 is different from the process in which the multi-dimensional fusion features corresponding to at least two video clips are obtained through the first video recognition sub-model. are consistent, so please refer to the description of step S101 above, which will not be described again here. Since the model parameters in the dotted line area of Figure 5 are better than the model parameters in Figure 4, the multi-dimensional fusion feature 402e is better than at least two multi-dimensional fusion features in step S101.
  • the embodiment of this application jointly models the personalized interests of the object and the content of the video clips at the same time.
  • the second video recognition sub-model may include the first text encoding network layer 40g, the first splicing network layer 40h and a second fully connected network layer 40i.
  • the business server performs text encoding on each object label text in the object label text sequence to obtain the first object label feature 401g corresponding to the object label text sequence; the business server converts the first object label feature 401g and the multi-dimensional fusion features corresponding to the candidate video clips (such as the multi-dimensional fusion features 402e in Figure 5) are respectively input to the first splicing network layer 40h.
  • the first object label feature 401g and the multi-dimensional The fusion feature 402e performs feature splicing to obtain the first multi-dimensional splicing feature 401h corresponding to the candidate video clip; further, the business server inputs the first multi-dimensional splicing feature 401h to the second fully connected network layer 40i, through the second fully connected network In layer 40i, the second shared quality corresponding to the candidate video clip can be obtained.
  • the embodiment of the present application does not limit the network type of the first text encoding network layer 40g, and it can be any natural language processing network.
  • the process in which the service server selects at least one candidate video segment from the candidate video segments as a candidate shared video segment according to the second sharing quality corresponding to the candidate video segment may be referred to the description in Figure 2 above, and will not be described again here.
  • Step S103 Determine the third sharing quality corresponding to each candidate shared video clip and the auxiliary description information corresponding to each candidate shared video clip based on the object label text sequence and the candidate shared video clips.
  • the auxiliary description information refers to the description information used to assist the video clip, including but not limited to the following modal information or a combination of multiple modal information: the copy (text mode), cover (image mode) of the video clip. mode), voice introduction (audio mode), etc., which can be set according to the actual application scenario.
  • the third sharing quality is used to characterize the matching degree of the auxiliary description information with the video clip and the object tag text of the shared object.
  • the service server determines the third sharing quality corresponding to the candidate shared video clip through the third video recognition sub-model in the video recognition model, and then determines the auxiliary description information.
  • the auxiliary description information includes copywriting
  • the above-mentioned third video recognition sub-model includes a fourth video recognition sub-model
  • the above-mentioned third video recognition sub-model includes a fifth video recognition sub-model
  • the third video recognition sub-model may include a fourth video recognition sub-model and a fifth video recognition sub-model.
  • the fourth video recognition sub-model and the fifth video recognition sub-model please refer to the description in the embodiment corresponding to FIG. 6 below, which will not be described here.
  • Step S104 Determine the shared video segments from the candidate shared video segments based on the first sharing quality, the second sharing quality, and the third sharing quality corresponding to each candidate shared video segment, and combine the shared video segments and the auxiliary information corresponding to the shared video segments. Description information, identified as shared data to be sent to the shared object.
  • the first shared quality, the second shared quality, and the third shared quality corresponding to the candidate shared video segment are weighted and summed to obtain the total shared quality corresponding to the candidate shared video segment. ;
  • the embodiment of this application proposes a method for realizing intelligent sharing of videos.
  • this method can automatically dig out multiple video clips with high sharing value in the video. Based on the mining of the object's interests, high-value shared clips that are more consistent with the object's personalized interests can be selected, and corresponding personalized shared cover images and shared copywriting can be generated, making video sharing more intelligent and showing more intuitively.
  • this method can be more consistent with object personalization, so it can further improve the video sharing effect.
  • On the premise of improving the video sharing effect only video clips are shared instead of the entire video, which saves network transmission resources and processing resources of the receiving device of the shared data.
  • FIG. 6 is another schematic flowchart of a data processing method provided by an embodiment of the present application.
  • This method may be executed by a business server (for example, the business server 100 shown in Figure 1 above), or by a terminal device (for example, the terminal device 200a shown in Figure 1 above), or by interaction between the business server and the terminal device. implement.
  • the embodiment of this application takes the method being executed by the service server as an example for description. As shown in Figure 6, the method may include at least the following steps.
  • Step S201 Obtain at least two video segments in the video, determine the first sharing quality corresponding to the at least two video segments, and select at least one video segment from the at least two video segments as a candidate video based on the first sharing quality. fragment.
  • Step S202 Obtain the object label text sequence associated with the video, determine the second sharing quality corresponding to each candidate video clip according to the object label text sequence and the candidate video clips, and determine the second sharing quality corresponding to each candidate video clip from the candidate video clip. At least one candidate video segment is selected from the video segments as a candidate shared video segment.
  • step S201 to step S202 please refer to step S101 to step S102 in the embodiment corresponding to FIG. 3 above, which will not be described again here.
  • the auxiliary description information corresponding to the candidate shared video clip includes a description image corresponding to the candidate shared video clip, and a description text corresponding to the candidate shared video clip; the candidate shared video clip corresponds to the third
  • the three sharing qualities include the image sharing quality corresponding to the description image, and the text sharing quality corresponding to the description text.
  • step S201 For each candidate shared video segment determined in step S201, the following steps S203 to S206 are performed to determine the third sharing quality and auxiliary description information of each candidate shared video segment.
  • Step S203 Obtain at least two video frames in the candidate shared video clips, and determine the image sharing quality corresponding to each video frame in the at least two video needles.
  • image sampling is performed on the candidate shared video clips according to the image sampling period, and at least two video frames in the candidate shared video clips are obtained; for each video frame, the video frame is input to the video recognition model, and through the video The image recognition network layer of the recognition model performs feature extraction on the video frames respectively to obtain the shared image features corresponding to the video frames; wherein the video recognition model includes a fourth video recognition sub-model; the fourth video recognition sub-model includes an image Identify the network layer and the second concatenated network layer.
  • the business server can obtain at least two video frames from the candidate shared video clips through the image sampling cycle (for example, sampling one picture per second), and the at least two video frames are used as candidate description images.
  • the business server needs to determine at least two videos
  • Figure 7 is a schematic model structure diagram of a fourth video recognition sub-model provided by an embodiment of the present application. It can be understood that the process for the business server to obtain the image sharing quality corresponding to each video frame through the fourth video recognition sub-model is consistent. Therefore, the embodiment of the present application takes obtaining the image sharing quality corresponding to the video frame F1 as an example to describe , please refer to the description below for the processing process of the remaining video frames among at least two video frames.
  • the business server inputs the video frame F1 to the image recognition network layer 70a in the fourth video recognition sub-model, and performs feature extraction on the video frame F1 through the image recognition network layer 70a to obtain the shared image feature 701a corresponding to the video frame F1.
  • the multi-dimensional fusion feature corresponding to the candidate shared video clip and obtain the second object label feature corresponding to the object label text sequence; combine the shared image feature 701a corresponding to the video frame F1, the multi-dimensional fusion feature corresponding to the candidate shared video clip and the second object
  • the label features are respectively input to the second splicing network layer; through the second splicing network layer, the shared image features corresponding to the video frame F1, the multi-dimensional fusion features corresponding to the candidate shared video segments and the second object label features are spliced to obtain the video
  • the second multi-dimensional splicing feature corresponding to the frame F1 determine the image sharing quality corresponding to the video frame F1 according to the second multi-dimensional splicing feature corresponding to the video frame F1.
  • the fourth video recognition sub-model also includes a third fully connected network layer; for each video frame, the second multi-dimensional splicing feature corresponding to the video frame is input to the third fully connected network layer, and through the third fully connected network layer, Perform feature transformation on the second multi-dimensional splicing feature corresponding to the video frame to obtain the image sharing quality corresponding to the video frame.
  • Step S204 Determine the image sharing quality corresponding to the candidate shared video segment based on the image sharing quality corresponding to each video frame, and select one video frame from the at least two video frames as the description image corresponding to the candidate shared video segment.
  • the maximum image sharing quality is obtained from the image sharing qualities corresponding to at least two video frames, and the maximum image sharing quality is determined as the image sharing quality corresponding to the candidate shared video clip; among at least two video frames, the maximum image sharing quality is determined The video frame corresponding to the maximum image sharing quality is determined as the description image corresponding to the candidate shared video clip.
  • the embodiments of this application can provide three different ways of obtaining multi-dimensional fusion features corresponding to candidate shared video clips.
  • the first obtaining method please refer to step S102 in the embodiment corresponding to Figure 3 above for obtaining the multi-dimensional fusion features corresponding to candidate video clips.
  • the description of multi-dimensional fusion features has the same principle.
  • the second acquisition method is similar to the first acquisition method.
  • Step S102 in Figure 3 has provided the multi-dimensional fusion features corresponding to the candidate video clips (including the multi-dimensional fusion in Figure 4 Feature 402e), and the candidate shared video clip belongs to the candidate video clip, so the business server can obtain the multi-dimensional fusion corresponding to the candidate shared video clip from the multi-dimensional fusion feature corresponding to the candidate video clip output in the second video recognition sub-model. feature.
  • Both of the above acquisition methods can reduce the computing time and cost of the video recognition model.
  • Figure 7 is a model of a fourth video recognition sub-model provided by an embodiment of the present application. Schematic.
  • the model structure in the dotted area is the same as the model structure in the second video recognition sub-model of Figure 5, but the model parameters between the two are inconsistent because when training the fourth video recognition sub-model, the business server
  • the model parameters in the trained second video recognition sub-model are used as the initialization model parameters in the dotted area of Figure 7, and based on the third training sample set (including multiple sample videos, object label sample text sequences, each The sample description image corresponding to the sample video and the description image quality label corresponding to each sample video) are used to fine-tune the initialization model parameters.
  • step S101 the process in which the service server obtains the multi-dimensional fusion features corresponding to the candidate shared video clips through the dotted area in Figure 7 is consistent with the process in which the multi-dimensional fusion features 402e are obtained through the second video recognition sub-model, so please refer to The description of step S101 above will not be repeated here. Since the model parameters in the dotted area of Figure 7 are better than the model parameters in Figure 5, the multi-dimensional fusion features corresponding to the candidate shared video clips output in Figure 7 are better than the multi-dimensional fusion features 402e in Figure 5.
  • the first acquisition method determine the first object tag feature 401g output in Figure 5 as the second object tag feature;
  • Obtaining method As shown in Figure 7, the object label text sequence is input to the fourth video recognition sub-model.
  • the process of the business server obtaining the second object label feature through the dotted area in Figure 7 is the same as the process through the first process in Figure 5.
  • the process of obtaining the first object label feature 401g by the text encoding network layer 40g is the same, so please refer to the description of step S102 above and will not be described again here.
  • the business server inputs the shared image feature 701a corresponding to the video frame F1, the multi-dimensional fusion feature corresponding to the candidate shared video clip and the second object label feature respectively to the second splicing network layer 70b; through the second splicing network Layer 70b can perform feature splicing on the shared image feature 701a, the multi-dimensional fusion feature corresponding to the candidate shared video clip, and the second object label feature, so the second multi-dimensional splicing feature 701b corresponding to the video frame F1 can be obtained; further, the business server The second multi-dimensional splicing feature 701b is input to the third fully connected network layer 70c.
  • the service server can obtain the image sharing quality corresponding to at least two video frames.
  • Step S205 Based on the object tag text sequence and the content text corresponding to the candidate shared video clip, determine the text sharing quality corresponding to the candidate shared video clip and the description text corresponding to the candidate shared video clip.
  • the description text is composed of N shared words; a video recognition model is obtained; the video recognition model includes the fifth video recognition sub-model; the fifth video recognition sub-model includes the second text encoding network layer and the third text encoding network layer, attention network layer and text decoding network layer; input the content text corresponding to the candidate shared video clip into the second text encoding network layer, and perform text encoding on the content text corresponding to the candidate shared video clip through the second text encoding network layer , obtain the content text features; input the object label text sequence into the third text encoding network layer, and perform text encoding on the object label text sequence through the third text encoding network layer to obtain the third object label feature; input the content text features, candidates
  • the text features to be decoded Si and the third object label features corresponding to the shared video clips are input to the attention network layer respectively.
  • the content text features, the text features to be decoded Si and the third object label features are characterized. Fusion, obtain the attention weight corresponding to the content text feature; i is a non-negative integer less than N; according to the attention weight corresponding to the content text feature, determine the to-be-decoded text feature S i+1 corresponding to the candidate shared video clip; the to-be-decoded text
  • the shared word indicated by the feature S i is the previous shared word of the shared word indicated by the text feature S i+1 to be decoded; when i+1 is equal to N, N text features to be decoded are input to the text decoding network layer respectively.
  • N shared words indicated by the text features to be decoded respectively and form the N shared words into description texts corresponding to the candidate shared video clips; based on the N text features to be decoded, generate the text corresponding to the candidate shared video clips Text sharing quality.
  • the definition of content text corresponding to the candidate shared video clip please refer to the definition of content text E 1 in Figure 3 above.
  • the definitions of the second text encoding network layer and the third text encoding network layer please refer to Figure 3 above.
  • the definition of the first text encoding network layer; the attention network layer is the Attention network.
  • FIG. 8 is a schematic model structure diagram of a fifth video recognition sub-model provided by an embodiment of the present application.
  • the business server performs basic processing on the content text corresponding to the candidate shared video clip, including word segmentation and tagging (Token), and queries each word (word 1 as shown in Figure 8) through a vocabulary table (such as Lookup table) , word 2,...word n) respectively.
  • Each initial word vector is used as the input of the second text encoding network layer to understand the content text corresponding to the candidate shared video clip and obtain the content text characteristics, that is,
  • the word vector corresponding to each word is represented by word 1, word 2,..., and word n as shown in the figure.
  • the process of the business server obtaining the third object label feature ie, the object representation in Figure 8) can be referred to the generation process of the second object label feature above, and will not be described again here.
  • the business server uses the content text features (word 1 representation, word 2 representation,..., word n representation), the third object label feature (object representation) and the shared word representation generated in the previous step as input to the attention network layer.
  • chase Step 1 generates sharing copy (i.e., description text) corresponding to the candidate shared video clip.
  • sharing copy i.e., description text
  • the business server Multiply the maximum probability in each generation step as a candidate shared video clip to generate text sharing quality describing the text.
  • the symbol " ⁇ S>" in Figure 8 identifies the start.
  • Step S206 Determine the third sharing quality corresponding to the candidate shared video clip according to the image sharing quality corresponding to the candidate shared video clip and the text sharing quality corresponding to the candidate shared video clip; according to the description image corresponding to the candidate shared video clip and the candidate sharing The description text corresponding to the video clip determines the auxiliary description information corresponding to the candidate shared video clip.
  • the image sharing quality and text sharing quality of the candidate shared video segment may be used as the third sharing quality of the candidate shared video segment.
  • the description image can be used as the video cover of the candidate shared video clip
  • the description text can be used as the video copy of the candidate shared video clip.
  • the embodiment of the present application takes the auxiliary description information including the description image and the description text as an example.
  • the auxiliary description information includes the description image and the description text.
  • the description information only includes description text, or only includes description images, or the auxiliary description information includes audio content, etc.
  • the embodiments of the present application do not limit the content of the auxiliary description information, and can be set according to actual application scenarios.
  • Step S207 Determine the shared video segments from the candidate shared video segments based on the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video segments, and add the shared video segments and the auxiliary description information corresponding to the shared video segments. , identified as shared data to be sent to the shared object.
  • the first sharing quality, the second sharing quality, the image sharing quality and the text sharing quality corresponding to the candidate shared video clips are weighted and summed to obtain the total sharing quality corresponding to the candidate shared video clips.
  • the subsequent process can be seen in the figure above. The description of step S104 in the embodiment corresponding to 3 will not be described again here.
  • the embodiment of this application proposes a method for implementing intelligent video sharing.
  • object interest i.e., object tag text sequence.
  • Share the shared video clips that match the sharing object and construct a personalized description image (which can be used as the cover of the shared video clip) and description text (which can be used as the copywriting of the shared video clip) that suits the sharing object, so it can attract the sharing object to watch the shared video Clips can thereby improve the sharing conversion of the video platform and improve the overall playback status of the video platform.
  • FIG. 9 is another schematic flowchart of a data processing method provided by an embodiment of the present application.
  • This method may be executed by a business server (for example, the business server 100 shown in Figure 1 above), or by a terminal device (for example, the terminal device 200a shown in Figure 1 above), or by interaction between the business server and the terminal device. implement.
  • a business server for example, the business server 100 shown in Figure 1 above
  • a terminal device for example, the terminal device 200a shown in Figure 1 above
  • interaction between the business server and the terminal device. implement the embodiment of this application takes the method being executed by the service server as an example for description. As shown in Figure 9, this method The method may include at least the following steps.
  • Step S301 obtain a training sample set; the training sample set includes a plurality of sample videos, a sample text sequence of object labels of the browsing sample objects associated with each sample video, a first quality label, a second quality label corresponding to each sample video, and a third quality label. Three quality labels.
  • the following operations are performed to obtain the first quality label corresponding to the sample video:
  • the candidate first quality label corresponding to the sample video is less than the first quality label threshold, then the candidate first quality label corresponding to the sample video is determined to be the first quality label corresponding to the sample video; if the candidate first quality label corresponding to the sample video is If the label is equal to or greater than the first quality label threshold, the first quality label threshold is determined as the first quality label corresponding to the sample video.
  • each sample video perform the following operations to obtain the second quality label corresponding to the sample video: obtain the first playback completion degree of the browsed sample object for the sample video; if the first playback completion degree is greater than the A playback completion threshold, then it is determined that there is a first positive correlation between the object label sample text and the sample video, and the first positive correlation is determined as the second quality label of the sample video; if the first playback completion is less than or equal to the first playback completion threshold, it is determined that there is a first reverse correlation between the object label sample text and the sample video, and the first reverse correlation is determined as the second quality label of the sample video.
  • the training sample set also includes a sample description image corresponding to each sample video; the third quality label includes a description image quality label; for each sample video: obtain the second playback completion of the browse sample object for the sample video degree; if the second playback completion degree is greater than the second playback completion degree threshold, it is determined that there is a second positive correlation between the sample description image, the object label sample text and the sample video, and the second positive correlation is determined as Description image quality label of the sample video; if the second playback completion degree is less than or equal to the second playback completion degree threshold, it is determined that there is a second reverse correlation between the sample description image, the object label sample text and the sample video, and the second The reverse correlation relationship is determined as the descriptive image quality label of the sample video.
  • the third quality label includes a description text quality label; the method further includes: for Each sample video:
  • the third playback completion degree of the browsing sample object for the sample video if the third playback completion degree is greater than the third playback completion degree threshold, obtain the sample content text corresponding to the sample video, and add the sample content text to the training sample set; It is determined that there is a third positive correlation relationship between the object label sample text sequence and the sample content text, and the third positive correlation relationship is determined to be the description text quality label of the sample video.
  • the training sample set may include a first training sample set for training the first video recognition sub-model, a second training sample set for training the second video recognition sub-model, and a third training sample set for training the third video recognition sub-model.
  • Three training sample sets when the auxiliary description information only includes description images, the third video recognition sub-model includes the fourth video recognition sub-model, and the third training sample set is the fourth training sample set; when the auxiliary description information only includes description text , the third video recognition sub-model includes the fifth video recognition sub-model, and the third training sample set is the fifth training sample set; when the auxiliary description information includes description images and description text, the third video recognition sub-model includes the fourth video recognition sub-model and the fifth video recognition sub-model, and the third training sample set includes the fourth training sample set and the fifth training sample set.
  • the first training sample set includes a plurality of sample videos and the first quality label corresponding to each sample video;
  • the fifth training sample set includes a plurality of sample videos, an object label sample text sequence of the browse sample object associated with each sample video And the description text quality label corresponding to the sample video.
  • sample videos included in the above five training sample sets can be the same or different.
  • the main difference is that the labels and uses are different.
  • the video platform has a lot of short videos, so the short videos can be determined as sample videos.
  • the corresponding duration of the short video is shorter, for example The duration corresponding to the short video is equal to the duration corresponding to the video clip.
  • the first quality label threshold, the first playback completion threshold, the second playback completion threshold, and the third playback completion threshold can all be adjusted according to actual application scenarios.
  • the embodiments of this application do not use the above four thresholds. Make restrictions.
  • Step S302 Input the training sample set to the video recognition model, and determine the first prediction quality corresponding to each sample video through the video recognition model.
  • the business server can input the first training sample set in step S301 to the first video recognition sub-model in the video recognition model, where the business server obtains the first video recognition sub-model corresponding to each sample video through the first video recognition sub-model.
  • the process of predicting quality is consistent with the process of obtaining the first shared quality corresponding to the video clip through the first video recognition sub-model. Therefore, please refer to the description of step S101 in the embodiment corresponding to Figure 3 above. No further details will be given.
  • Step S303 Determine the second prediction quality and the third prediction quality corresponding to each sample video according to the object label sample text sequence and each sample video.
  • the business server can input the second training sample set in step S301 to the second video recognition sub-model in the video recognition model, where the business server obtains the second video recognition sub-model corresponding to each sample video through the second video recognition sub-model.
  • the process of predicting quality is consistent with the process of obtaining the second shared quality corresponding to the video clip through the second video recognition sub-model. Therefore, please refer to the description of step S102 in the embodiment corresponding to Figure 3 above. No further details will be given.
  • the business server may input the third training sample set in step S301 to the third video recognition sub-model in the video recognition model, where the business server obtains the third prediction quality corresponding to each sample video through the third video recognition sub-model.
  • the processing process is consistent with the processing process of obtaining the third shared quality corresponding to the video clip through the third video recognition sub-model. Therefore, please refer to the description of step S103 in the embodiment corresponding to Figure 3 above, which will not be performed here. Repeat.
  • Step S304 Adjust the parameters in the video recognition model according to the first quality label, the second quality label, the third quality label, the first prediction quality, the second prediction quality and the third prediction quality to obtain the trained video recognition Model; the trained video recognition model is used to determine the shared data of the video; the shared data includes shared video segments in the video and auxiliary description information corresponding to the shared video segments.
  • the video recognition model includes a first video recognition sub-model used to determine the first prediction quality, a second video recognition sub-model used to determine the second prediction quality, and a third video recognition sub-model used to determine the third prediction quality.
  • the parameters in the video recognition model include parameters in the first video recognition sub-model, parameters in the second video recognition sub-model, and parameters in the third video recognition sub-model; determining the first quality label and the first prediction The first quality loss value between qualities, adjust the parameters in the first video recognition sub-model according to the first quality loss value, and obtain the trained first video recognition sub-model; determine the second quality label and the second prediction The second quality loss value between the qualities, adjust the parameters in the second video recognition sub-model according to the second quality loss value, and obtain the trained second video recognition sub-model; determine the third quality label and the third prediction The third quality loss value between the qualities, according to the third quality loss value, adjust the parameters in the third video recognition sub-model to obtain the trained third video recognition sub-model; when the first video recognition sub-model, the third
  • the embodiment of the present application performs in-depth modeling on the first video recognition sub-model through the first training sample set, so that the first video recognition sub-model can determine candidate video clips with high sharing value among multiple video clips, and through the second
  • the training sample set performs in-depth modeling on the second video recognition sub-model, so that the second video recognition sub-model can determine candidate shared video segments with high sharing value among the candidate video segments, and assists video recognition through the third training sample set.
  • the sub-model performs deep modeling so that the third video recognition sub-model can determine the third shared quality corresponding to the candidate shared video clip.
  • the shared video clips and their corresponding auxiliary description information can be determined through the sharing quality of different dimensions, and then the shared data can be generated, because the shared data is not only associated with the video content of the shared video clip itself, but also with the video content of the shared video clip itself.
  • Object tag text sequences are associated, so by sharing data, the sharing efficiency and sharing effect of the video can be improved.
  • FIG. 10 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the above-mentioned data processing device 1 can be used to execute corresponding steps in the method provided by the embodiments of the present application.
  • the data processing device 1 may include: a first acquisition module 110 , a second acquisition module 120 , a first determination module 130 and a second determination module 140 .
  • the first acquisition module 110 is configured to acquire at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and select at least one video clip from the at least two video clips according to the first sharing quality. Video clips as candidate video clips;
  • the second acquisition module 120 is configured to obtain an object tag text sequence associated with the video, where the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share;
  • the object tag text of the browsing object is used to characterize the interest of the browsing object, and the object tag text of the shared object is used to characterize the interest of the shared object;
  • each candidate video segment is determined according to the object tag text sequence and the candidate video segments.
  • Corresponding second sharing quality according to the second sharing quality corresponding to each candidate video segment, select at least one candidate video segment from the candidate video segments as the candidate shared video segment; the second sharing quality is used to characterize the candidate video The relevance of the fragment to the object tag text of the shared object;
  • the first determination module 130 is configured to determine the third sharing quality corresponding to each candidate shared video segment and the auxiliary description information corresponding to each candidate shared video segment according to the object label text sequence and the candidate shared video segment; the third Sharing quality is used to characterize the matching degree of the auxiliary description information with the candidate shared video clip and the object tag text of the shared object;
  • the second determination module 140 is configured to determine the shared video segments from the candidate shared video segments according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to each candidate shared video segment, and combine the shared video segments and the shared video segments.
  • the auxiliary description information corresponding to the video clip is determined as shared data for sending to the sharing object.
  • first acquisition module 110 the second acquisition module 120, the first determination module 130 and the second determination module 140
  • steps S101 to S104 in the corresponding embodiment of Figure 3 above, which will not be performed here. Repeat.
  • the description of the beneficial effects of using the same method will not be described again.
  • FIG. 11 is another schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the above-mentioned data processing device 2 can be used to execute corresponding steps in the method provided by the embodiments of the present application.
  • the data processing device 2 may include: a first acquisition module 11, a second acquisition module 12, a first determination module Block 13 and the second determination module 14.
  • first acquisition module 11 in Figure 11 has all or part of the functions of the first acquisition module 110 in Figure 10
  • second acquisition module 12 in Figure 11 has the functions of the second acquisition module 120 in Figure 10 All or part of the functions
  • first determination module 13 in Figure 11 has all or part of the functions of the first determination module 130 in Figure 10
  • second determination module 14 in Figure 11 has the functions of the second determination module 140 in Figure 10 All or part of the functionality.
  • the first acquisition module 11 may include: a first processing unit 111 and a first acquisition unit 112 .
  • the first processing unit 111 is used to obtain the video, segment the video according to the time window, and obtain at least two video segments corresponding to the video;
  • the first acquisition unit 112 is configured to perform the following operations for each video segment in the at least two video segments to determine the first sharing quality corresponding to the video segment:
  • K is a positive integer; fuse the video features corresponding to the K video frames to obtain the video of the video clip.
  • the first sharing quality of each video clip is determined respectively.
  • step S101 for the specific functional implementation of the first processing unit 111 and the first acquisition unit 112, please refer to step S101 in the corresponding embodiment of FIG. 3, which will not be described again here.
  • the second acquisition module 12 may include: a second acquisition unit 121 and a generation unit 122 .
  • the second obtaining unit 121 is used to obtain the object tag text of the browsing object associated with the video, and obtain the object tag text of the shared object associated with the browsing object;
  • the object tag text sequence is generated according to the object tag text of the browse object and the object tag text of the shared object.
  • the generation unit 122 is configured to perform the following operations for each candidate video segment to determine the second sharing quality corresponding to the candidate video segment:
  • the object label text sequence and the candidate video segment are respectively input to a video recognition model;
  • the video recognition model includes a second video recognition sub-model;
  • the second video recognition sub-model includes a first text encoding network layer;
  • text encoding is performed on each object label text in the object label text sequence to obtain the first object label feature corresponding to the object label text sequence;
  • Multi-dimensional fusion features corresponding to the candidate video segments are obtained, and second sharing quality corresponding to the candidate video segments is determined based on the first object label features and the multi-dimensional fusion features corresponding to the candidate video segments.
  • step S102 for the specific functional implementation of the second acquisition unit 121 and the generation unit 122, please refer to step S102 in the corresponding embodiment of FIG. 3, which will not be described again here.
  • the auxiliary description information corresponding to the candidate shared video clip includes a description image corresponding to the candidate shared video clip, and a description text corresponding to the candidate shared video clip;
  • the third sharing quality corresponding to the candidate shared video clip includes the The image sharing quality corresponding to the description image, and the text sharing quality corresponding to the description text;
  • the first determination module 13 may include: a third acquisition unit 131, a second determination unit 132, and a third determination unit 133.
  • the third obtaining unit 131 is used to obtain at least two video frames in the candidate shared video clips
  • the second determining unit 132 is configured to determine the image sharing quality corresponding to each video frame in the at least two video frames, determine the image sharing quality of the candidate shared video segment according to the image sharing quality corresponding to each video frame, and determine the image sharing quality from Select one video frame from the at least two video frames as the description image corresponding to the candidate shared video segment;
  • the third determination unit 133 is configured to determine the text sharing quality corresponding to the candidate shared video clips and the description text corresponding to the candidate shared video clips based on the object tag text sequence and the content text corresponding to the candidate shared video clips.
  • the second determination module 14 may include: a quality summation unit 141 and a fourth determination unit 142 .
  • the quality summation unit 141 is configured to perform a weighted sum of the first shared quality, the second shared quality, and the third shared quality corresponding to each candidate shared video segment, respectively, to obtain the total shared quality corresponding to each candidate shared video segment;
  • the fourth determination unit 142 is configured to determine the candidate shared video segment with the largest total sharing quality among the at least two candidate shared video segments as the shared video segment;
  • the auxiliary description information corresponding to the shared video clip is obtained.
  • step S104 The specific functional implementation of the quality summation unit 141 and the fourth determination unit 142 can be referred to step S104 in the corresponding embodiment of FIG. 3 above, and will not be described again here.
  • the shared data in this application is determined based on the sharing quality of different dimensions. It is not only associated with the video content of the shared video clip itself, but also associated with the object tag text sequence. Therefore, by sharing data, the sharing efficiency of the video can be improved. Share effects.
  • Figure 12 is another structure of a data processing device provided by an embodiment of the present application. Schematic diagram. The above-mentioned data processing device 3 can be used to execute corresponding steps in the method provided by the embodiments of this application. As shown in FIG. 12 , the data processing device 3 may include: a first acquisition module 210 , a first determination module 220 , a second determination module 230 and a parameter adjustment module 240 .
  • the first acquisition module 210 is used to acquire a training sample set;
  • the training sample set includes a plurality of sample videos, a sample text sequence of object tags of browsing sample objects associated with each sample video, a first quality label corresponding to each sample video, a third Second quality label and third quality label;
  • the first determination module 220 is used to input the training sample set to the video recognition model, and determine the first prediction quality corresponding to each sample video through the video recognition model;
  • the second determination module 230 is configured to determine the second prediction quality and the third prediction quality corresponding to each sample video according to the object label sample text sequence and the plurality of sample videos;
  • the parameter adjustment module 240 is used to adjust the parameters in the video recognition model according to the first quality label, the second quality label, the third quality label, the first prediction quality, the second prediction quality and the third prediction quality to obtain training.
  • the video recognition model after training is used to determine the shared data of the video; the shared data includes the shared video clips in the video and the auxiliary description information corresponding to the shared video clips.
  • the specific functional implementation of the first acquisition module 210, the first determination module 220, the second determination module 230 and the parameter adjustment module 240 can be referred to steps S301 to S304 in the corresponding embodiment of Figure 9 above, and will not be described again here. .
  • the description of the beneficial effects of using the same method will not be described again.
  • FIG. 13 is another schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the above-mentioned data processing device 4 can be used to execute corresponding steps in the method provided by the embodiments of the present application.
  • the data processing device 4 may include: a first acquisition module 21 , a first determination module 22 , a second determination module 23 and a parameter adjustment module 24 .
  • first acquisition module 21 in Figure 13 has all or part of the functions of the first acquisition module 210 in Figure 12
  • first determination module 22 in Figure 13 has the functions of the first determination module 220 in Figure 12 All or part of the functions
  • second determination module 23 in Figure 13 has all or part of the functions of the second determination module 230 in Figure 12
  • parameter adjustment module 24 in Figure 13 has all or part of the parameter adjustment module 240 in Figure 12 Some functions.
  • the data processing device 4 may also include: a first operation module 25, a second operation module 26, a second acquisition module 27, a third determination module 28, a proportion summation module 29, a first comparison module 30 and The fourth determination module 31.
  • the first operation module 25 is configured to perform a product operation for each sample video on the number of plays, duration and average play completion corresponding to the sample video to obtain the first sample parameter corresponding to the sample video;
  • the second operation module 26 is used to calculate, for each sample video, the number of object comment texts corresponding to the sample video. and the number of interactions with the object's comment text are summed to obtain the second sample parameter corresponding to the sample video;
  • the second acquisition module 27 is configured to obtain the maximum value of the first sample parameter among the first sample parameters corresponding to at least two sample videos, and obtain the maximum value of the first sample parameter among the second sample parameters corresponding to the at least two sample videos.
  • the third determination module 28 is used to determine the first ratio between the first sample parameter corresponding to each sample video and the maximum value of the first sample parameter, and determine the second sample parameter corresponding to each sample video and the maximum value of the second sample parameter. second ratio between values;
  • the proportion summation module 29 is used to perform a weighted sum of the first proportion and the second proportion of each sample video to obtain the candidate first quality label corresponding to each sample video;
  • the first comparison module 30 is used to compare the first quality label candidate corresponding to each sample video with the first quality label threshold respectively;
  • the fourth determination module 31 is configured to determine, for each sample video, the candidate first quality label corresponding to the sample video if the candidate first quality label corresponding to the sample video is less than the first quality label threshold.
  • the first quality label
  • the fourth determination module 31 is also configured to determine the first quality label threshold corresponding to the first quality label corresponding to the sample video if the candidate first quality label corresponding to the sample video is equal to or greater than the first quality label threshold.
  • FIG. 9 corresponds to step S301 in the embodiment, and will not be described again here.
  • the data processing device 4 may further include: a second comparison module 32 and a fifth determination module 33 .
  • the second comparison module 32 is used to obtain the first playback completion degree of the browse sample object for each sample video, and compare the first playback completion degree of each sample video with the first playback completion degree threshold respectively;
  • the fifth determination module 33 is configured to determine, for each sample video, that there is a first positive association between the object label sample text and the sample video if the first playback completion degree of the sample video is greater than the first playback completion degree threshold. relationship, determine the first positive relationship as the second quality label of the sample video;
  • the fifth determination module 33 is also configured to determine that there is a first reverse association between the object label sample text and the sample video if the first playback completion degree of the sample video is less than or equal to the first playback completion degree threshold, and the The first reverse correlation relationship is determined as the second quality label of the sample video.
  • step S301 The specific functional implementation of the second comparison module 32 and the fifth determination module 33 can be referred to step S301 in the corresponding embodiment of FIG. 9 , and will not be described again here.
  • the training sample set also includes the sample description image corresponding to the sample video;
  • the third quality label package Includes labels describing image quality;
  • the data processing device 4 may also include: a third comparison module 34 and a sixth determination module 35 .
  • the third comparison module 34 is used to obtain the second playback completion degree of the browse sample object for each sample video, and compare the second playback completion degree of each sample video with the second playback completion degree threshold respectively;
  • the sixth determination module 35 is used for each sample video, if the second playback completion degree of the sample video is greater than the second playback completion degree threshold, determine the sample description image, the object label sample text and the sample corresponding to the sample video. There is a second positive correlation between the videos, and the second positive correlation is determined as the descriptive image quality label of the sample video;
  • the sixth determination module 35 is also configured to determine the relationship between the sample description image corresponding to the sample video, the object label sample text and the sample video if the second playback completion degree of the sample video is less than or equal to the second playback completion degree threshold. There is a second reverse correlation relationship, and the second reverse correlation relationship is determined as a descriptive image quality label of the sample video.
  • step S301 The specific functional implementation of the third comparison module 34 and the sixth determination module 35 can be referred to step S301 in the corresponding embodiment of FIG. 9 , and will not be described again here.
  • the third quality label includes a description text quality label
  • the data processing device 4 may also include: a third acquisition module 36 , a fourth acquisition module 37 and a seventh determination module 38 .
  • the third acquisition module 36 is used to obtain the third playback completion degree of the browsed sample object for each sample video
  • the fourth acquisition module 37 is used for each sample video, if the third playback completion degree of the sample video is greater than the third playback completion degree threshold, obtain the sample content text corresponding to the sample video, and add the sample content text to the training sample set;
  • the seventh determination module 38 is used to determine that there is a third positive correlation relationship between the object label sample text sequence and the sample content text of the sample video, and determine the third positive correlation relationship as the description text quality label of the sample video.
  • step S301 for the specific functional implementation of the third acquisition module 36, the fourth acquisition module 37 and the seventh determination module 38, please refer to step S301 in the corresponding embodiment of FIG. 9, and will not be described again here.
  • the video recognition model includes a first video recognition sub-model for determining the first prediction quality, a second video recognition sub-model for determining the second prediction quality, and a third video recognition sub-model for determining the third prediction quality.
  • the parameters in the video recognition model include parameters in the first video recognition sub-model, parameters in the second video recognition sub-model, and parameters in the third video recognition sub-model;
  • the parameter adjustment module 24 may include: a first adjustment unit 241, a second adjustment unit 242, a third adjustment unit 243, and a model generation unit 244.
  • the first adjustment unit 241 is used to determine the first quality loss value between the first quality label and the first predicted quality, Adjust the parameters in the first video recognition sub-model according to the first quality loss value to obtain the trained first video recognition sub-model;
  • the second adjustment unit 242 is used to determine the second quality loss value between the second quality label and the second prediction quality, and adjust the parameters in the second video recognition sub-model according to the second quality loss value to obtain the trained The second video recognition sub-model;
  • the third adjustment unit 243 is used to determine the third quality loss value between the third quality label and the third prediction quality, and adjust the parameters in the third video recognition sub-model according to the third quality loss value to obtain the trained The third video recognition sub-model;
  • the model generation unit 244 is configured to generate, when the first video recognition sub-model, the second video recognition sub-model and the third video recognition sub-model all meet the model convergence conditions, the trained first video recognition sub-model, the trained The second video recognition sub-model and the trained video recognition model of the trained third video recognition sub-model.
  • step S304 for the specific functional implementation of the first adjustment unit 241, the second adjustment unit 242, the third adjustment unit 243 and the model generation unit 244, please refer to step S304 in the corresponding embodiment of FIG. 9, which will not be described again here.
  • the embodiment of the present application performs in-depth modeling on the first video recognition sub-model through the first training sample set, so that the first video recognition sub-model can determine candidate video clips with high sharing value among multiple video clips, and through the second
  • the training sample set conducts in-depth modeling of the second video recognition sub-model, so that the second video recognition sub-model can determine candidate shared video segments with high sharing value among the candidate video segments, and the third video recognition sub-model is trained through the third training sample set.
  • the recognition sub-model performs in-depth modeling so that the third video recognition sub-model can determine the third sharing quality and auxiliary description information corresponding to the candidate shared video clips, and then determine the shared video clips and their corresponding Auxiliary description information can then be used to generate shared data. Since the shared data is not only associated with the video content of the shared video clip itself, but also associated with the object tag text sequence, sharing the data can improve the sharing efficiency and sharing effect of the video.
  • the computer device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, a memory 1005, and at least one communication bus 1002.
  • the communication bus 1002 is used to realize connection communication between these components.
  • the user interface 1003 may include a display and a keyboard
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001. As shown in Figure 14, memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the video processing methods described in the above embodiments.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the description of the data processing method or device in the previous embodiments is implemented. Herein No longer.
  • the description of the beneficial effects of using the same method will not be described again.
  • the above-mentioned computer-readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned computer equipment, such as the hard disk or memory of the computer equipment.
  • the computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the computer device, Flash card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
  • An embodiment of the present application also provides a computer program product.
  • the computer program product includes a computer program, and the computer program is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device can execute the description of the data processing method or device in the previous embodiments, which will not be described again here.
  • the description of the beneficial effects of using the same method will not be described again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A data processing method, and a device and a computer-readable storage medium. The method comprises: according to first sharing qualities respectively corresponding to at least two video clips, determining candidate video clips; according to an object label text sequence and the candidate video clips, determining second sharing qualities corresponding to the candidate video clips, and according to the second sharing qualities corresponding to the candidate video clips, determining candidate shared video clips; according to the object label text sequence and the candidate shared video clips, determining third sharing qualities corresponding to the candidate shared video clips and auxiliary description information corresponding to the candidate shared video clips; and according to the first sharing quality, the second sharing quality, the third sharing quality and the auxiliary description information, which respectively correspond to each candidate shared video clip, determining shared data.

Description

数据处理方法、设备以及计算机可读存储介质Data processing methods, equipment and computer-readable storage media
本申请要求于2022年4月1日提交中国专利局、申请号为202210336414.6,发明名称为“一种数据处理方法、设备以及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on April 1, 2022, with the application number 202210336414.6, and the invention name is "a data processing method, equipment and computer-readable storage medium", and its entire content is approved by This reference is incorporated into this application.
技术领域Technical field
本申请涉及互联网技术领域,尤其涉及一种数据处理方法、设备以及计算机可读存储介质。This application relates to the field of Internet technology, and in particular, to a data processing method, equipment and computer-readable storage medium.
背景技术Background technique
计算机视觉技术(Computer Vision,CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别(Optical Character Recognition,OCR)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建、自动驾驶、智慧交通等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。Computer vision technology (Computer Vision, CV) is a science that studies how to make machines "see". Furthermore, it refers to using cameras and computers to replace human eyes to identify and measure targets and other machine vision, and further to do graphics. Processing, so that computer processing becomes an image more suitable for human eye observation or transmitted to instrument detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction, autonomous driving, smart transportation and other technologies, as well as common biometric recognition technologies such as face recognition and fingerprint recognition.
视频分享(视频共享),即视频对应的浏览对象在视频应用浏览视频时,将视频分享给其他浏览对象,视频分享是视频浏览对象交流的一个主要途径,对视频应用的对象活跃度、播放情况影响较大。Video sharing (video sharing) means that the browsing object corresponding to the video shares the video with other browsing objects when browsing the video in the video application. Video sharing is a main way for the video browsing objects to communicate, and the object activity and playback status of the video application are affected. Greater impact.
技术内容Technical content
本申请实施例提供一种数据处理方法、设备以及计算机可读存储介质,可以在提高视频的共享效率以及共享效果的前提下,节省网络传输资源以及共享数据的接收设备的处理资源。Embodiments of the present application provide a data processing method, equipment, and computer-readable storage media, which can save network transmission resources and processing resources of shared data receiving devices on the premise of improving video sharing efficiency and sharing effects.
本申请实施例一方面提供了一种数据处理方法,由计算机设备执行,包括:On the one hand, embodiments of the present application provide a data processing method, which is executed by a computer device, including:
获取视频中的至少两个视频片段,确定至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从至少两个视频片段中选择至少一个视频片段作为候选视频片段;Obtain at least two video segments in the video, determine first sharing qualities corresponding to the at least two video segments, and select at least one video segment from the at least two video segments as a candidate video segment based on the first sharing quality;
获取与视频相关联的对象标签文本序列,所述对象标签文本序列包括分享所述视频的 浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣;Obtain an object tag text sequence associated with the video, the object tag text sequence includes sharing the video The object tag text of the browsing object and the object tag text of the shared object that receives the share; the object tag text of the browsing object is used to characterize the interest of the browsing object, and the object tag text of the shared object is used to characterize the shared object interest of;
根据对象标签文本序列以及候选视频片段,确定各候选视频片段对应的第二共享质量,根据各候选视频片段对应的第二共享质量,从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段;所述第二共享质量用于表征所述候选视频片段与所述共享对象的对象标签文本的相关性;Determine the second sharing quality corresponding to each candidate video segment according to the object label text sequence and the candidate video segment, and select at least one candidate video segment from the candidate video segment as the candidate shared video segment according to the second sharing quality corresponding to each candidate video segment. ;The second sharing quality is used to characterize the correlation between the candidate video clip and the object label text of the shared object;
根据对象标签文本序列以及候选共享视频片段,确定各候选共享视频片段对应的第三共享质量,以及各候选共享视频片段对应的辅助描述信息;所述第三共享质量用于表征所述辅助描述信息与所述候选共享视频片段以及共享对象的对象标签文本的匹配度;According to the object label text sequence and the candidate shared video clips, the third sharing quality corresponding to each candidate shared video clip and the auxiliary description information corresponding to each candidate shared video clip are determined; the third sharing quality is used to characterize the auxiliary description information Matching degree with the candidate shared video clip and the object tag text of the shared object;
根据各候选共享视频片段对应的第一共享质量、第二共享质量以及第三共享质量,从候选共享视频片段中确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。According to the first sharing quality, the second sharing quality and the third sharing quality corresponding to each candidate shared video segment, the shared video segment is determined from the candidate shared video segments, and the shared video segment and the auxiliary description information corresponding to the shared video segment are determined as For shared data sent to a shared object.
本申请实施例一方面提供了一种数据处理装置,包括:On the one hand, embodiments of the present application provide a data processing device, including:
第一获取模块,用于获取视频中的至少两个视频片段,确定至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从至少两个视频片段中选择至少一个视频片段作为候选视频片段;The first acquisition module is used to acquire at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and select at least one video from the at least two video clips according to the first sharing quality. clips as candidate video clips;
第二获取模块,用于获取与视频相关联的对象标签文本序列,所述对象标签文本序列包括分享所述视频的浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣;根据对象标签文本序列以及候选视频片段,确定各候选视频片段对应的第二共享质量,根据各候选视频片段对应的第二共享质量,从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段;所述第二共享质量用于表征所述候选视频片段与所述共享对象的对象标签文本的相关性;The second acquisition module is used to obtain the object tag text sequence associated with the video, the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the sharing; the browsing The object label text of the object is used to represent the interest of the browsing object, and the object label text of the shared object is used to represent the interest of the shared object; according to the object label text sequence and the candidate video clips, determine the corresponding content of each candidate video clip. The second sharing quality is to select at least one candidate video segment from the candidate video segments as the candidate shared video segment according to the second sharing quality corresponding to each candidate video segment; the second sharing quality is used to characterize the relationship between the candidate video segment and the candidate video segment. Describes the relevance of the object label text of the shared object;
第一确定模块,用于根据对象标签文本序列以及候选共享视频片段,确定各候选共享视频片段对应的第三共享质量,以及各候选共享视频片段对应的辅助描述信息;所述第三共享质量用于表征所述辅助描述信息与所述候选共享视频片段以及共享对象的对象标签文本的匹配度;The first determination module is used to determine the third sharing quality corresponding to each candidate shared video clip and the auxiliary description information corresponding to each candidate shared video clip according to the object label text sequence and the candidate shared video clip; the third sharing quality is used To characterize the matching degree between the auxiliary description information and the candidate shared video clip and the object tag text of the shared object;
第二确定模块,用于根据各候选共享视频片段对应的第一共享质量、第二共享质量以及第三共享质量,从候选共享视频片段中确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。 The second determination module is configured to determine the shared video segments from the candidate shared video segments according to the first sharing quality, the second sharing quality and the third sharing quality corresponding to each candidate shared video segment, and associate the shared video segments with the shared video segments. The auxiliary description information is determined as shared data sent to the shared object.
本申请实施例还提供了一种计算机设备,包括:处理器、存储器、网络接口;An embodiment of the present application also provides a computer device, including: a processor, a memory, and a network interface;
上述处理器与上述存储器、上述网络接口相连,其中,上述网络接口用于提供数据通信功能,上述存储器用于存储计算机程序,上述处理器用于调用上述计算机程序,以使得计算机设备执行本申请实施例中的方法。The above-mentioned processor is connected to the above-mentioned memory and the above-mentioned network interface, wherein the above-mentioned network interface is used to provide data communication functions, the above-mentioned memory is used to store computer programs, and the above-mentioned processor is used to call the above-mentioned computer programs to cause the computer device to execute the embodiments of the present application. method in.
本申请实施例一方面提供了一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,上述计算机程序适于由处理器加载并执行本申请实施例中的方法。On the one hand, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. The computer program is suitable for being loaded by a processor and executing the method in the embodiment of the present application.
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中;计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得该计算机设备执行本申请实施例中的方法。Embodiments of the present application also provide a computer program product. The computer program product includes a computer program, and the computer program is stored in a computer-readable storage medium; the processor of the computer device reads the computer program from the computer-readable storage medium, The processor executes the computer program, so that the computer device executes the method in the embodiment of the present application.
附图简要说明Brief description of the drawings
图1是本申请实施例提供的一种系统架构示意图;Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图2是本申请实施例提供的一种数据处理的场景示意图;Figure 2 is a schematic diagram of a data processing scenario provided by an embodiment of the present application;
图3是本申请实施例提供的一种数据处理方法的流程示意图;Figure 3 is a schematic flow chart of a data processing method provided by an embodiment of the present application;
图4是本申请实施例提供的一种第一视频识别子模型的模型结构示意图;Figure 4 is a schematic model structure diagram of a first video recognition sub-model provided by an embodiment of the present application;
图5是本申请实施例提供的一种第二视频识别子模型的模型结构示意图;Figure 5 is a schematic model structure diagram of a second video recognition sub-model provided by an embodiment of the present application;
图6是本申请实施例提供的一种数据处理方法的另一流程示意图;Figure 6 is another schematic flowchart of a data processing method provided by an embodiment of the present application;
图7是本申请实施例提供的一种第四视频识别子模型的模型结构示意图;Figure 7 is a schematic model structure diagram of a fourth video recognition sub-model provided by an embodiment of the present application;
图8是本申请实施例提供的一种第五视频识别子模型的模型结构示意图;Figure 8 is a schematic model structure diagram of a fifth video recognition sub-model provided by an embodiment of the present application;
图9是本申请实施例提供的一种数据处理方法的又一流程示意图;Figure 9 is another schematic flowchart of a data processing method provided by an embodiment of the present application;
图10是本申请实施例提供的一种数据处理装置的结构示意图;Figure 10 is a schematic structural diagram of a data processing device provided by an embodiment of the present application;
图11是本申请实施例提供的一种数据处理装置的另一结构示意图;Figure 11 is another structural schematic diagram of a data processing device provided by an embodiment of the present application;
图12是本申请实施例提供的一种数据处理装置的另一结构示意图;Figure 12 is another structural schematic diagram of a data processing device provided by an embodiment of the present application;
图13是本申请实施例提供的一种数据处理装置的另一结构示意图;Figure 13 is another structural schematic diagram of a data processing device provided by an embodiment of the present application;
图14是本申请实施例提供的一种计算机设备的结构示意图。Figure 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实 施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other practical results obtained by those of ordinary skill in the art without making creative efforts Examples all belong to the protection scope of this application.
在一些方案中,视频共享过程是将视频整个内容均共享至好友,且携带的辅助描述信息是视频应用对应的运营平台提前构建好的信息,明显地,共享整个视频会占用过多的网络资源,进而降低视频的共享效率;又由于共享至不同的对象都是相同的辅助描述信息,故会导致共享展示方式过于单一,降低了共享效果。In some solutions, the video sharing process is to share the entire video content with friends, and the auxiliary description information carried is information built in advance by the operating platform corresponding to the video application. Obviously, sharing the entire video will occupy too many network resources. , thereby reducing the sharing efficiency of the video; and because the same auxiliary description information is shared to different objects, the sharing display method will be too single and the sharing effect will be reduced.
在本申请实施例中,计算机设备确定视频中的至少两个视频片段分别对应的第一共享质量,故可以根据第一共享质量,从至少两个视频片段中确定候选视频片段,可以理解的是,候选视频片段属于视频且其共享价值(质量)优于视频的共享价值;进一步,计算机设备获取与视频相关联的对象标签文本序列,根据对象标签文本序列以及候选视频片段,确定候选视频片段对应的第二共享质量,故可以根据候选视频片段对应的第二共享质量,从候选视频片段中确定候选共享视频片段,可以理解的是,候选共享视频片段不仅基于候选视频片段的视频内容确定,还基于对象标签文本序列确定,故其共享价值(质量)优于候选视频片段的共享价值;进一步,计算机设备根据对象标签文本序列以及候选共享视频片段,确定候选共享视频片段对应的第三共享质量,根据候选共享视频片段对应的第三共享质量,确定候选共享视频片段对应的辅助描述信息,可以理解的是,辅助描述信息不仅与候选共享视频片段相关联,还与对象标签文本序列相关联;进一步,计算机设备根据候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从候选共享视频片段中确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。上述可知,本申请中的共享数据是基于不同维度的共享质量所确定的,不仅与共享视频片段自身的视频内容相关联,还与对象标签文本序列相关联,故通过共享数据,可以提高视频的共享效率以及共享效果。而且,由于分享的是视频片段而不是整个视频,可以节省网络传输资源以及分享数据的接收设备的处理资源。In this embodiment of the present application, the computer device determines the first sharing quality corresponding to at least two video clips in the video. Therefore, candidate video clips can be determined from the at least two video clips based on the first sharing quality. It can be understood that , the candidate video clip belongs to the video and its shared value (quality) is better than the shared value of the video; further, the computer device obtains the object label text sequence associated with the video, and determines the candidate video clip corresponding to the object label text sequence and the candidate video clip. The second sharing quality, therefore, the candidate shared video clip can be determined from the candidate video clips according to the second sharing quality corresponding to the candidate video clip. It can be understood that the candidate shared video clip is not only determined based on the video content of the candidate video clip, but also It is determined based on the object label text sequence, so its shared value (quality) is better than the shared value of the candidate video clip; further, the computer device determines the third sharing quality corresponding to the candidate shared video clip based on the object label text sequence and the candidate shared video clip, According to the third sharing quality corresponding to the candidate shared video clip, the auxiliary description information corresponding to the candidate shared video clip is determined. It can be understood that the auxiliary description information is not only associated with the candidate shared video clip, but also associated with the object label text sequence; further , the computer device determines the shared video segment from the candidate shared video segment according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video segment, and uses the shared video segment and the auxiliary description information corresponding to the shared video segment , identified as shared data to be sent to the shared object. As can be seen from the above, the shared data in this application is determined based on the sharing quality of different dimensions. It is not only associated with the video content of the shared video clip itself, but also associated with the object label text sequence. Therefore, by sharing the data, the quality of the video can be improved. Sharing efficiency and sharing effects. Moreover, since video clips are shared rather than the entire video, network transmission resources and processing resources of the receiving device sharing the data can be saved.
请参见图1,图1是本申请实施例提供的一种系统架构示意图。如图1所示,该系统可以包括业务服务器100以及终端设备集群,终端设备集群可以包括:终端设备200a、终端设备200b、终端设备200c、…、终端设备200n,可以理解的是,上述系统可以包括一个或者多个终端设备,本申请不对终端设备的数量进行限制。Please refer to Figure 1, which is a schematic diagram of a system architecture provided by an embodiment of the present application. As shown in Figure 1, the system may include a business server 100 and a terminal device cluster. The terminal device cluster may include: terminal device 200a, terminal device 200b, terminal device 200c,..., terminal device 200n. It can be understood that the above system can Including one or more terminal devices, this application does not limit the number of terminal devices.
其中,终端设备之间可以存在通信连接,例如终端设备200a与终端设备200b之间存在通信连接,终端设备200a与终端设备200c之间存在通信连接。同时,终端设备集群中的任一终端设备可以与业务服务器100存在通信连接,例如终端设备200a与业务服务器100之间存在通信连接,其中,上述通信连接不限定连接方式,可以通过有线通信方式进行直接或间接地连接,也可以通过无线通信方式进行直接或间接地连接,还可以通过其它 方式,本申请在此不做限制。There may be a communication connection between the terminal devices, for example, a communication connection exists between the terminal device 200a and the terminal device 200b, and a communication connection exists between the terminal device 200a and the terminal device 200c. At the same time, any terminal device in the terminal device cluster may have a communication connection with the service server 100. For example, there is a communication connection between the terminal device 200a and the service server 100. The above-mentioned communication connection is not limited to a connection method and can be carried out through wired communication. Directly or indirectly, it can also be connected directly or indirectly through wireless communication, or it can also be connected through other The method is not limited in this application.
应当理解,如图1所示的终端设备集群中的每个终端设备均可以安装有应用客户端,当该应用客户端运行于各终端设备中时,可以分别通过上述的通信连接与上述图1所示的业务服务器100进行数据交互。其中,该应用客户端可以为视频应用、直播应用、社交应用、即时通信应用、游戏应用、音乐应用、购物应用、小说应用、浏览器等具有加载视频功能的应用客户端。其中,该应用客户端可以为独立的客户端,也可以为集成在某客户端(例如,社交客户端、教育客户端以及多媒体客户端等)中的嵌入式子客户端,在此不做限定。以视频应用为例,业务服务器100可以为包括视频应用对应的后台服务器、数据处理服务器等多个服务器的集合,因此,每个终端设备均可以通过该视频应用对应的应用客户端与业务服务器100进行数据传输,如每个终端设备均可以通过视频应用的应用客户端将其本地的视频上传至业务服务器100,进而业务服务器100可以将该视频下发给其它终端设备或传送至云服务器。It should be understood that each terminal device in the terminal device cluster as shown in Figure 1 can be installed with an application client. When the application client is running in each terminal device, it can be connected to the above-mentioned Figure 1 through the above communication connection. The business server 100 shown performs data exchange. Among them, the application client can be a video application, a live broadcast application, a social networking application, an instant messaging application, a game application, a music application, a shopping application, a novel application, a browser, and other application clients with a video loading function. Among them, the application client can be an independent client, or it can be an embedded sub-client integrated in a certain client (for example, a social client, an education client, a multimedia client, etc.), and there is no limitation here. . Taking a video application as an example, the business server 100 can be a collection of multiple servers including a background server corresponding to the video application, a data processing server, etc. Therefore, each terminal device can communicate with the business server 100 through the application client corresponding to the video application. For data transmission, for example, each terminal device can upload its local video to the business server 100 through the application client of the video application, and then the business server 100 can deliver the video to other terminal devices or transmit it to the cloud server.
可以理解的是,在本申请的具体实施方式中,涉及到用户信息(例如对象标签文本序列)等相关的数据,当本申请中的实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It can be understood that in the specific implementation of this application, related data such as user information (such as object label text sequence) is involved. When the embodiments in this application are applied to specific products or technologies, user permission needs to be obtained. Or agree, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
为便于后续理解和说明,本申请实施例可以在图1所示的终端设备集群中选择一个终端设备作为目标终端设备,例如以终端设备200a作为目标终端设备。当获取到视频,并接收到将视频共享至与浏览对象相关联的共享对象的视频共享指令时,终端设备200a可以将视频标识、浏览对象标识以及共享对象标识作为待识别数据发送至业务服务器100,本申请实施例将使用终端设备200a的用户称作浏览对象,将与浏览对象具有关联关系的用户(例如好友用户)称作共享对象,其中,本申请实施例不对浏览对象标识(浏览对象已授权)进行限定,包括但不限于浏览对象在应用客户端中所绑定的手机、标识号,可以根据实际应用场景进行设定,同理,共享对象标识亦如此;视频标识可以为任意一种能够用于标识应用客户端中的视频的信息。To facilitate subsequent understanding and explanation, in this embodiment of the present application, one terminal device can be selected as the target terminal device in the terminal device cluster shown in FIG. 1 , for example, terminal device 200a is used as the target terminal device. When acquiring a video and receiving a video sharing instruction to share the video to a shared object associated with the browsing object, the terminal device 200a may send the video identification, the browsing object identification, and the sharing object identification as data to be identified to the service server 100 , the embodiment of the present application refers to the user using the terminal device 200a as a browsing object, and the users (such as friend users) who are associated with the browsing object are called shared objects. The embodiment of the present application does not identify the browsing object (the browsing object has been authorization), including but not limited to the mobile phone and identification number bound to the browsing object in the application client, which can be set according to the actual application scenario. Similarly, the same is true for the shared object identification; the video identification can be any Information that can be used to identify the video in the application client.
进一步,业务服务器100接收到终端设备200a发送的待识别数据后,可以根据视频标识获取视频,根据浏览对象标识以及共享对象标识可以获取对象标签文本序列。所述对象标签文本序列包括分享所述视频的浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣。业务服务器100获取视频中的至少两个视频片段,业务服务器100获取已训练好的视频识别模型,该视频识别模型可以包括第一 视频识别子模型、第二视频识别子模型以及第三视频识别子模型;通过第一视频识别子模型,业务服务器100可以确定至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,可以从至少两个视频片段中确定候选视频片段;进一步,在第二视频识别子模型中,业务服务器100可以根据对象标签文本序列以及候选视频片段,确定每个候选视频片段对应的第二共享质量,根据每个候选视频片段对应的第二共享质量,可以从候选视频片段中确定候选共享视频片段;进一步,在第三视频识别子模型中,根据对象标签文本序列以及候选共享视频片段,业务服务器100可以确定每个候选共享视频片段对应的第三共享质量,以及每个候选共享视频片段对应的辅助描述信息;进一步,根据每个候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,业务服务器100从候选共享视频片段中可以确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。Further, after receiving the data to be identified sent by the terminal device 200a, the service server 100 can obtain the video according to the video identification, and obtain the object tag text sequence according to the browsing object identification and the shared object identification. The object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; the object tag text of the browsing object is used to represent the interest of the browsing object, and the sharing The object tag text of an object is used to characterize the interest of the shared object. The service server 100 obtains at least two video clips in the video, and the service server 100 obtains a trained video recognition model. The video recognition model may include a first Video recognition sub-model, second video recognition sub-model and third video recognition sub-model; through the first video recognition sub-model, the service server 100 can determine the first sharing quality respectively corresponding to at least two video clips, according to the first Sharing quality, candidate video segments can be determined from at least two video segments; further, in the second video recognition sub-model, the service server 100 can determine the third video segment corresponding to each candidate video segment according to the object label text sequence and the candidate video segments. Second sharing quality, according to the second sharing quality corresponding to each candidate video segment, candidate shared video segments can be determined from the candidate video segments; further, in the third video recognition sub-model, according to the object label text sequence and candidate shared video segments , the service server 100 can determine the third sharing quality corresponding to each candidate shared video segment, and the auxiliary description information corresponding to each candidate shared video segment; further, according to the first sharing quality, the second sharing quality corresponding to each candidate shared video segment Sharing quality, and the third sharing quality, the service server 100 can determine the shared video clip from the candidate shared video clips, and determine the shared video clip and the auxiliary description information corresponding to the shared video clip as shared data for sending to the sharing object.
后续,业务服务器100将共享数据发送至终端设备200a,终端设备200a接收到业务服务器100发送的共享数据后,可以在其对应的屏幕上显示该共享数据,进一步,终端设备200a可以将携带视频标识的共享数据,发送至共享对象对应的终端设备(例如图1中的终端设备200b)。终端设备200b获取到携带视频标识的共享数据后,可以在其屏幕上显示该共享数据,进一步,共享对象可以根据共享数据所携带的视频标识,查看完整的视频。在一些实施例中,若浏览对象授权业务服务器100具备共享权限,则业务服务器100在生成共享数据后,可以将该共享数据发送至共享对象对应的终端设备(如图1中的终端设备200b),后续过程请参见上文描述,此处不进行赘述。Subsequently, the business server 100 sends the shared data to the terminal device 200a. After receiving the shared data sent by the business server 100, the terminal device 200a can display the shared data on its corresponding screen. Furthermore, the terminal device 200a can carry the video identification The shared data is sent to the terminal device corresponding to the sharing object (for example, the terminal device 200b in Figure 1). After the terminal device 200b obtains the shared data carrying the video identifier, it can display the shared data on its screen. Furthermore, the sharing object can view the complete video based on the video identifier carried by the shared data. In some embodiments, if the browsing object authorizes the service server 100 to have sharing permissions, after generating the shared data, the service server 100 can send the shared data to the terminal device corresponding to the shared object (the terminal device 200b in Figure 1). , please refer to the above description for the subsequent process and will not be repeated here.
在一些实施例中,业务服务器100生成针对共享视频片段的共享标识,将共享标识以及辅助描述信息发送至终端设备200a,则终端设备200a获取到共享标识后,可以生成针对视频的携带该共享标识以及辅助描述信息的共享信息,进一步,终端设备200a将共享信息发送至共享对象对应的终端设备200b,则终端设备200b获取到该共享信息时,可以根据共享标识播放视频中的共享视频片段。在一些实施例中,若浏览对象授权业务服务器100具备共享权限,则业务服务器100生成共享标识后,可以将共享标识以及辅助描述信息发送至终端设备200b,后续过程请参见上文描述,此处不进行赘述。In some embodiments, the service server 100 generates a sharing identifier for the shared video clip, and sends the sharing identifier and auxiliary description information to the terminal device 200a. Then, after the terminal device 200a obtains the sharing identifier, it can generate a sharing identifier for the video carrying the sharing identifier. and the sharing information of the auxiliary description information. Furthermore, the terminal device 200a sends the sharing information to the terminal device 200b corresponding to the sharing object. When the terminal device 200b obtains the sharing information, it can play the shared video clip in the video according to the sharing identification. In some embodiments, if the browsing object authorizes the service server 100 to have sharing permissions, then after the service server 100 generates the sharing identifier, it can send the sharing identifier and auxiliary description information to the terminal device 200b. For the subsequent process, please refer to the above description, here No further details will be given.
在一些实施例中,若终端设备200a的本地存储了上述视频识别模型,则终端设备200a可以通过视频识别模型,确定视频中的至少两个视频片段分别对应的第一共享质量,故可以从至少两个视频片段中确定候选视频片段;根据对象标签文本序列以及候选视频片段,终端设备200a可以确定候选视频片段对应的第二共享质量,进而可以从候选视频片段中确定候选共享视频片段;根据对象标签文本序列以及候选共享视频片段,终端设备200a可以 确定候选共享视频片段对应的第三共享质量,以及候选共享视频片段对应的辅助描述信息;根据候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,终端设备200a可以从候选共享视频片段中确定共享视频片段,所以可以将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。In some embodiments, if the terminal device 200a stores the above-mentioned video recognition model locally, the terminal device 200a can use the video recognition model to determine the first sharing quality corresponding to at least two video clips in the video, so it can obtain from at least Determine the candidate video clip from the two video clips; according to the object label text sequence and the candidate video clip, the terminal device 200a can determine the second sharing quality corresponding to the candidate video clip, and then determine the candidate shared video clip from the candidate video clip; according to the object tag text sequence and candidate shared video clips, the terminal device 200a can Determine the third sharing quality corresponding to the candidate shared video clip, and the auxiliary description information corresponding to the candidate shared video clip; according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video clip, the terminal device 200a can The shared video clip is determined from the candidate shared video clips, so the shared video clip and the auxiliary description information corresponding to the shared video clip can be determined as shared data for sending to the sharing object.
其中,由于训练视频识别模型涉及到大量的离线计算,因此终端设备200a本地的视频识别模型,可以是由业务服务器100训练完成后发送至终端设备200a的。Among them, since training the video recognition model involves a large amount of offline calculations, the local video recognition model of the terminal device 200a can be sent to the terminal device 200a after the training is completed by the service server 100.
可以理解的是,本申请实施例中的共享数据,是基于视频以及对象标签文本序列所自动构建的,具备高共享价值,故共享视频片段可以直观反映视频的精彩内容,同时与浏览对象/共享对象的兴趣标签吻合,故可以提升视频的共享效率与效果。It can be understood that the shared data in the embodiment of the present application is automatically constructed based on the video and the object tag text sequence, and has high sharing value. Therefore, the shared video clips can intuitively reflect the wonderful content of the video, and at the same time, share with the browsing object/ The object's interest tags match, so the sharing efficiency and effect of the video can be improved.
需要说明的是,上述业务服务器100、终端设备200a、终端设备200b、终端设备200c...、终端设备200n均可以为区块链网络中的区块链节点,全文叙述的数据(例如对象标签文本序列以及共享数据)可以进行存储,存储方式可以是区块链节点根据数据生成区块,并将区块添加至区块链中进行存储的方式。It should be noted that the above-mentioned business server 100, terminal equipment 200a, terminal equipment 200b, terminal equipment 200c..., terminal equipment 200n can all be blockchain nodes in the blockchain network, and the data described in the full text (such as object tags) Text sequences and shared data) can be stored. The storage method can be that the blockchain node generates blocks based on the data and adds the blocks to the blockchain for storage.
区块链是一种分布式数据存储、点对点传输、共识机制以及加密算法等计算机技术的新型应用模式,主要用于对数据按时间顺序进行整理,并加密成账本,使其不可被篡改和伪造,同时可进行数据的验证、存储和更新。区块链本质上是一个去中心化的数据库,该数据库中的每个节点均存储一条相同的区块链,区块链网络可以将节点区分为核心节点、数据节点以及轻节点。核心节点、数据节点以及轻节点共同组成区块链节点。其中核心节点负责区块链全网的共识,也就是说核心节点为区块链网络中的共识节点。对于区块链网络中的交易数据被写入账本的流程可以为,区块链网络中的数据节点或轻节点获取到交易数据,将交易数据在区块链网络中传递(也就是节点以接力棒的方式进行传递),直到共识节点收到该交易数据,共识节点再将该交易数据打包进区块,对该区块执行共识,待共识完成后将该交易数据写入账本。此处以对象标签文本序列以及共享数据示例交易数据,业务服务器100(区块链节点)在通过对交易数据的共识后,根据交易数据生成区块,将区块存储至区块链网络中;而对于交易数据(即对象标签文本序列以及共享数据)的读取,则可以由区块链节点在区块链网络中,获取到包含该交易数据的区块,进一步,在区块中获取交易数据。Blockchain is a new application model of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism and encryption algorithm. It is mainly used to organize data in chronological order and encrypt it into a ledger, making it impossible to be tampered with and forged. , and data verification, storage and update can be performed at the same time. Blockchain is essentially a decentralized database. Each node in the database stores an identical blockchain. The blockchain network can distinguish nodes into core nodes, data nodes and light nodes. Core nodes, data nodes and light nodes together form a blockchain node. The core node is responsible for the consensus of the entire blockchain network, which means that the core node is the consensus node in the blockchain network. The process for the transaction data in the blockchain network to be written into the ledger can be as follows: the data node or light node in the blockchain network obtains the transaction data and transmits the transaction data in the blockchain network (that is, the node passes the baton until the consensus node receives the transaction data, the consensus node then packages the transaction data into a block, performs consensus on the block, and writes the transaction data into the ledger after the consensus is completed. Here, object tag text sequence and shared data are used as examples of transaction data. After passing the consensus on the transaction data, the business server 100 (blockchain node) generates blocks based on the transaction data and stores the blocks in the blockchain network; and For reading transaction data (i.e., object tag text sequence and shared data), the blockchain node can obtain the block containing the transaction data in the blockchain network, and further obtain the transaction data in the block. .
可以理解的是,本申请实施例提供的方法可以由计算机设备执行,计算机设备包括但不限于终端设备或业务服务器。其中,业务服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云数据库、云服务、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以 及大数据和人工智能平台等基础云计算服务的云服务器。终端设备包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等。其中,终端设备和业务服务器可以通过有线或无线方式进行直接或间接地连接,本申请实施例在此不做限制。It can be understood that the methods provided by the embodiments of the present application can be executed by computer equipment, including but not limited to terminal equipment or business servers. Among them, the business server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud Communication, middleware services, domain name services, security services, CDN, etc. and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms. Terminal devices include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, aircraft, etc. The terminal device and the service server may be connected directly or indirectly through wired or wireless methods, and the embodiments of the present application are not limited here.
进一步地,请参见图2,图2是本申请实施例提供的一种数据处理的场景示意图。本申请实施例可应用于各种场景,包括但不限于云技术、人工智能、智慧交通、辅助驾驶等。本申请实施例可适用于针对视频的视频片段推荐场景、视频片段分发场景、视频片段搜索场景等业务场景,这里将不对具体的业务场景进行一一列举。其中,该数据处理场景的实现过程可以在业务服务器中进行,也可以在终端设备中进行,还可以在终端设备和业务服务器中交互进行,此处不做限制。为了便于叙述以及理解,本申请实施例以在终端设备和业务服务器中交互进行为例进行叙述,其中,终端设备可以为上述图1所对应实施例的终端设备集群中的任意一个终端设备,图2以终端设备200a为例进行叙述,业务服务器可以为上述图1所对应实施例的业务服务器100。Further, please refer to Figure 2, which is a schematic diagram of a data processing scenario provided by an embodiment of the present application. Embodiments of this application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, assisted driving, etc. The embodiments of the present application can be applied to business scenarios such as video clip recommendation scenarios, video clip distribution scenarios, and video clip search scenarios. Specific business scenarios will not be listed here. Among them, the implementation process of this data processing scenario can be carried out in the business server or in the terminal device. It can also be carried out interactively in the terminal device and the business server. There is no restriction here. In order to facilitate description and understanding, the embodiments of the present application are described by taking the interaction between a terminal device and a service server as an example. The terminal device can be any terminal device in the terminal device cluster in the embodiment corresponding to Figure 1. 2 Taking the terminal device 200a as an example for description, the service server may be the service server 100 in the embodiment corresponding to the above-mentioned FIG. 1 .
如图2所示,浏览对象20b与终端设备200a具有绑定关系,当浏览对象20b通过终端设备200a浏览视频201a时,终端设备200a可以在播放界面显示视频201a的基本信息,如视频时长(图2示例为6分钟)、视频封面(图2示例为猫咪图像205a)、视频文案(图2示例文案“小猫咪争食物”206a)。此外,终端设备200a还可以在播放界面显示针对视频201a的控件,例如图2所示例的播放控件207a以及共享控件202a。当浏览对象20b触发共享控件202a时,终端设备200a响应针对共享控件202a的触发操作,显示浏览对象20b的好友列表,图2示例好友列表包括3个好友,即好友“aa”、好友“bb”以及好友“cc”,若浏览对象20b触发好友“cc”对应的选择控件203a,则终端设备200a可以显示提示子页面,该提示子页面可以显示“取消控件”以及“共享”控件204a,当浏览对象20b触发“共享”控件204a时,终端设备200a将好友“cc”确定为共享对象。As shown in Figure 2, the browsing object 20b has a binding relationship with the terminal device 200a. When the browsing object 20b browses the video 201a through the terminal device 200a, the terminal device 200a can display the basic information of the video 201a on the playback interface, such as the video duration (Fig. 2 example is 6 minutes), video cover (example in Figure 2 is a cat image 205a), video copy (example copy in Figure 2 is "kittens fighting for food" 206a). In addition, the terminal device 200a can also display controls for the video 201a on the playback interface, such as the playback control 207a and the sharing control 202a illustrated in FIG. 2 . When the browsing object 20b triggers the sharing control 202a, the terminal device 200a responds to the triggering operation on the sharing control 202a and displays the friend list of the browsing object 20b. The example friend list in Figure 2 includes three friends, namely friend "aa" and friend "bb" and friend "cc". If the browsing object 20b triggers the selection control 203a corresponding to the friend "cc", the terminal device 200a can display a prompt sub-page, and the prompt sub-page can display a "cancel control" and a "share" control 204a. When browsing When the object 20b triggers the "share" control 204a, the terminal device 200a determines the friend "cc" as the sharing object.
可以理解的是,图2中所展示的界面以及控件仅仅是一些可供参考的表现形式,在实际业务场景中,开发人员可以根据产品需求来进行相关设计,本申请实施例对涉及到的界面和控件的具体形式不做限制。It can be understood that the interfaces and controls shown in Figure 2 are only some representations for reference. In actual business scenarios, developers can carry out relevant designs according to product requirements. The embodiments of this application do not include the interfaces involved. There are no restrictions on the specific form of controls.
终端设备200a可以获取视频201a对应的视频标识、浏览对象20b对应的浏览对象标识,以及共享对象对应的共享对象标识,然后将视频标识、浏览对象标识以及共享对象标识均发送至业务服务器100,以使业务服务器100通过视频标识获取视频201a,通过浏览对象标识以及共享对象标识确定对象标签文本序列。在一些实施例中,对象标签文本序列包括浏览对象20b的对象标签文本和共享对象的对象标签文本。浏览对象20b的对象标签文本用于表征浏览对象20b的兴趣;共享对象的对象标签文本用于表征共享对象的兴趣。 其中,本申请实施例不对业务服务器100获取视频201a以及对象标签文本序列的方式进行限定,可以如上文描述获取视频201a以及对象标签文本序列,还可以是终端设备200a将视频201a以及对象标签文本序列均发送至业务服务器100,业务服务器100还可以通过其他方式确定视频201a以及对象标签文本序列,具体应当根据实际场景进行设定。The terminal device 200a can obtain the video identification corresponding to the video 201a, the browsing object identification corresponding to the browsing object 20b, and the sharing object identification corresponding to the shared object, and then send the video identification, browsing object identification, and sharing object identification to the service server 100, so as to The business server 100 obtains the video 201a through the video identification, and determines the object tag text sequence through the browsing object identification and the shared object identification. In some embodiments, the object tag text sequence includes object tag text for the browse object 20b and object tag text for the shared object. The object tag text of the browsing object 20b is used to characterize the interest of the browsing object 20b; the object tag text of the shared object is used to characterize the interest of the shared object. Among them, the embodiment of the present application does not limit the way in which the service server 100 obtains the video 201a and the object label text sequence. The video 201a and the object label text sequence can be obtained as described above, or the terminal device 200a can obtain the video 201a and the object label text sequence. are sent to the business server 100. The business server 100 can also determine the video 201a and the object label text sequence through other methods. The specific settings should be based on the actual scenario.
进一步,业务服务器100可以通过时间窗口对视频201a进行切分处理,得到至少两个视频片段20d,本申请实施例示例时间窗口的长度为1分钟,结合视频201a对应的视频时长(图2示例为6分钟),故至少两个视频片段20d的数量为6,如图2所示例的视频片段201d、202d、203d、204d、205d以及206d。业务服务器100获取已训练好的视频识别模型20c,该视频识别模型20c可以包括第一视频识别子模型20e、第二视频识别子模型20f以及第三视频识别子模型20g。Further, the service server 100 can segment the video 201a through a time window to obtain at least two video clips 20d. The length of the time window in the embodiment of this application is 1 minute. Combined with the video duration corresponding to the video 201a (the example in Figure 2 is 6 minutes), so the number of at least two video clips 20d is 6, such as the video clips 201d, 202d, 203d, 204d, 205d and 206d as shown in FIG. 2 . The service server 100 obtains the trained video recognition model 20c. The video recognition model 20c may include a first video recognition sub-model 20e, a second video recognition sub-model 20f and a third video recognition sub-model 20g.
业务服务器100将至少两个视频片段20d分别输入至第一视频识别子模型20e,通过第一视频识别子模型20e,确定至少两个视频片段20d分别对应的第一共享质量,在一些实施例中,所述第一共享质量用于表征视频片段的分享价值,例如,所述第一共享质量可以是视频片段的互动率。如图2所示例,视频片段201d的第一共享质量为0.8,视频片段202d的第一共享质量为0.85,视频片段203d的第一共享质量为0.89,视频片段204d的第一共享质量为0.7,视频片段205d的第一共享质量为0.75,视频片段206d的第一共享质量为0.9;其中,业务服务器100确定视频片段对应的第一共享质量的具体过程,此处暂不展开叙述,请参见下文图3所对应的实施例中步骤S101的描述。The service server 100 inputs at least two video clips 20d to the first video recognition sub-model 20e respectively, and determines the first sharing quality corresponding to the at least two video clips 20d through the first video recognition sub-model 20e. In some embodiments, , the first sharing quality is used to characterize the sharing value of the video clip. For example, the first sharing quality may be the interaction rate of the video clip. As shown in the example of Figure 2, the first shared quality of the video clip 201d is 0.8, the first shared quality of the video clip 202d is 0.85, the first shared quality of the video clip 203d is 0.89, and the first shared quality of the video clip 204d is 0.7, The first shared quality of the video clip 205d is 0.75, and the first shared quality of the video clip 206d is 0.9. The specific process of the service server 100 determining the first shared quality corresponding to the video clip will not be described here. Please refer to the following. Figure 3 corresponds to the description of step S101 in the embodiment.
业务服务器100获取第一共享质量阈值,可以理解的是,第一共享质量阈值可以根据实际应用场景进行调整,本申请实施例示例为0.8。业务服务器100将各视频片段的第一共享质量分别与第一共享质量阈值进行对比,将第一共享质量等于或大于该第一共享质量阈值的视频片段,确定为候选视频片段201e,如图2所示,候选视频片段201e包括视频片段201d、202d、203d以及206d。进一步,业务服务器100将对象标签文本序列以及候选视频片段201e均输入至第二视频识别子模型20f,通过第二视频识别子模型20f,可以确定候选视频片段201e对应的第二共享质量,在一些实施例中,第二共享质量用于表征所述候选视频片段与共享对象的对象标签文本的相关性。如图2所示例,视频片段201d的第二共享质量为0.74,视频片段202d的第二共享质量为0.86,视频片段203d的第二共享质量为0.8,视频片段206d的第二共享质量为0.9;其中,业务服务器100确定候选视频片段对应的第二共享质量的具体过程,此处暂不展开叙述,请参见下文图3所对应的实施例中步骤S102的描述。The service server 100 obtains the first sharing quality threshold. It can be understood that the first sharing quality threshold can be adjusted according to the actual application scenario. An example in the embodiment of this application is 0.8. The service server 100 compares the first shared quality of each video clip with the first shared quality threshold, and determines the video clip whose first shared quality is equal to or greater than the first shared quality threshold as a candidate video clip 201e, as shown in Figure 2 As shown, candidate video segment 201e includes video segments 201d, 202d, 203d, and 206d. Further, the business server 100 inputs both the object label text sequence and the candidate video segment 201e to the second video recognition sub-model 20f. Through the second video recognition sub-model 20f, the second sharing quality corresponding to the candidate video segment 201e can be determined. In some cases, In an embodiment, the second sharing quality is used to characterize the correlation between the candidate video clip and the object label text of the shared object. As shown in the example of Figure 2, the second shared quality of the video clip 201d is 0.74, the second shared quality of the video clip 202d is 0.86, the second shared quality of the video clip 203d is 0.8, and the second shared quality of the video clip 206d is 0.9; The specific process of the service server 100 determining the second sharing quality corresponding to the candidate video clip will not be described here. Please refer to the description of step S102 in the embodiment corresponding to FIG. 3 below.
业务服务器100获取第二共享质量阈值,可以理解的是,第二共享质量阈值可以根据 实际应用场景进行调整,本申请实施例示例为0.85。业务服务器100将4个第二共享质量分别与第二共享质量阈值进行对比,将第二共享质量大于该第二共享质量阈值的候选视频片段,确定为候选共享视频片段201f,如图2所示,候选共享视频片段201f包括视频片段202d以及206d。进一步,业务服务器100将对象标签文本序列以及候选共享视频片段201f均输入至第三视频识别子模型20g,通过第三视频识别子模型20g,可以确定候选共享视频片段201f对应的第三共享质量,如图2所示例,视频片段202d的第三共享质量为0.82,视频片段206d的第三共享质量为0.87;其中,业务服务器100确定候选共享视频片段对应的第三共享质量的具体过程,此处暂不展开叙述,请参见下文图3所对应的实施例中步骤S103的描述。The service server 100 obtains the second sharing quality threshold. It can be understood that the second sharing quality threshold can be based on Adjust the actual application scenario, and the example in the embodiment of this application is 0.85. The service server 100 compares the four second shared qualities with the second shared quality threshold respectively, and determines the candidate video segment whose second shared quality is greater than the second shared quality threshold as the candidate shared video segment 201f, as shown in Figure 2 , the candidate shared video clip 201f includes video clips 202d and 206d. Further, the business server 100 inputs both the object label text sequence and the candidate shared video clip 201f to the third video recognition sub-model 20g. Through the third video recognition sub-model 20g, the third sharing quality corresponding to the candidate shared video clip 201f can be determined, As shown in the example of Figure 2, the third sharing quality of the video clip 202d is 0.82, and the third sharing quality of the video clip 206d is 0.87; the specific process of the service server 100 determining the third sharing quality corresponding to the candidate shared video clip is here The description will not be carried out for now, please refer to the description of step S103 in the embodiment corresponding to Figure 3 below.
根据候选共享视频片段对应的第三共享质量,业务服务器100可以确定候选共享视频片段对应的辅助描述信息,如图2所示例,业务服务器100确定视频片段202d的辅助描述信息202g,确定视频片段206d的辅助描述信息206g;其中,业务服务器100确定候选共享视频片段对应的辅助描述信息的具体过程,此处暂不展开叙述,请参见下文图3所对应的实施例中步骤S103的描述。According to the third sharing quality corresponding to the candidate shared video clip, the service server 100 can determine the auxiliary description information corresponding to the candidate shared video clip. As shown in Figure 2, the service server 100 determines the auxiliary description information 202g of the video clip 202d and determines the video clip 206d. The auxiliary description information 206g; wherein, the specific process of the service server 100 determining the auxiliary description information corresponding to the candidate shared video clip will not be described here. Please refer to the description of step S103 in the embodiment corresponding to Figure 3 below.
进一步,业务服务器100对视频片段202d对应的第一共享质量(图2示例为0.85)、第二共享质量(图2示例为0.86),以及第三共享质量(图2示例为0.82)进行加权求和,可以得到视频片段202d对应的总共享质量;同样地,对视频片段206d对应的第一共享质量(图2示例为0.9)、第二共享质量(图2示例为0.9),以及第三共享质量(图2示例为0.87)进行加权求和,业务服务器100可以得到视频片段206d对应的总共享质量;进一步,业务服务器100将视频片段202d对应的总共享质量以及视频片段206d对应的总共享质量进行对比,在两个总共享质量中,获取最大的总共享质量,本申请实施例中,视频片段206d对应的总共享质量为最大的总共享质量,则业务服务器100可以确定视频片段206d为共享视频片段;进一步,可以将共享视频片段(即视频片段206d)以及共享视频片段对应的辅助描述信息(如图2中示例的辅助描述信息206g),确定为共享数据20h。后续,业务服务器100可以将共享数据20h同步至终端设备200a,故终端设备200a可以将共享数据200a发送至共享对象(如图2所示例的好友“cc”)。Further, the service server 100 performs a weighted calculation on the first sharing quality (0.85 in the example of Figure 2), the second sharing quality (0.86 in the example of Figure 2), and the third sharing quality (0.82 in the example of Figure 2) corresponding to the video clip 202d. and, the total shared quality corresponding to the video segment 202d can be obtained; similarly, the first shared quality (0.9 in the example of Figure 2), the second shared quality (0.9 in the example of Figure 2), and the third shared quality corresponding to the video segment 206d The service server 100 can obtain the total shared quality corresponding to the video clip 206d by weighted summation of the quality (0.87 in the example of Figure 2); further, the service server 100 calculates the total shared quality corresponding to the video clip 202d and the total shared quality corresponding to the video clip 206d. By comparison, among the two total shared qualities, the maximum total shared quality is obtained. In the embodiment of the present application, the total shared quality corresponding to the video clip 206d is the maximum total shared quality, then the service server 100 can determine that the video clip 206d is shared. Video clip; further, the shared video clip (ie, the video clip 206d) and the auxiliary description information corresponding to the shared video clip (the auxiliary description information 206g as shown in FIG. 2) can be determined as the shared data 20h. Subsequently, the business server 100 can synchronize the shared data 20h to the terminal device 200a, so the terminal device 200a can send the shared data 200a to the sharing object (the friend "cc" as shown in Figure 2).
上述可知,本申请通过对视频进行深度建模,可以构建出多个具备高分享价值的视频片段,结合对象标签文本序列,可以生成与浏览对象以及共享对象强相关的辅助描述信息,实现视频共享的个性多样化,丰富视频共享功能,提升视频共享的用户体验。另外,由于只对视频片段而不是整个视频进行分享,节省了网络资源以及接收分享的共享对象的设备的处理资源。 As can be seen from the above, this application can construct multiple video clips with high sharing value through deep modeling of videos. Combined with the object tag text sequence, auxiliary description information that is strongly related to the browsing objects and shared objects can be generated to achieve video sharing. diversified personalities, enriched video sharing functions, and improved the user experience of video sharing. In addition, since only video clips are shared instead of the entire video, network resources and processing resources of the device receiving the shared object are saved.
进一步地,请参见图3,图3是本申请实施例提供的一种数据处理方法的流程示意图一。该数据处理方法可以由业务服务器(例如,上述图1所示的业务服务器100)执行,也可以由终端设备(例如,上述图1所示的终端设备200a)执行,还可以由业务服务器和终端设备交互执行。为便于理解,本申请实施例以该方法由业务服务器执行为例进行说明。如图3所示,该数据处理方法至少可以包括以下步骤S101-步骤S104。Further, please refer to Figure 3. Figure 3 is a schematic flowchart 1 of a data processing method provided by an embodiment of the present application. This data processing method can be executed by a business server (for example, the business server 100 shown in FIG. 1 above), or by a terminal device (for example, the terminal device 200a shown in FIG. 1 above), or by a business server and a terminal. Device interaction execution. For ease of understanding, the embodiment of this application takes the method being executed by the service server as an example for description. As shown in Figure 3, the data processing method may at least include the following steps S101 to S104.
步骤S101,获取视频中的至少两个视频片段,确定至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从至少两个视频片段中选择至少一个视频片段作为候选视频片段。Step S101: Obtain at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and select at least one video clip from the at least two video clips as a candidate video based on the first sharing quality. fragment.
在一些实施例中,可以根据时间窗口对视频进行切分处理,得到视频对应的至少两个视频片段;所述第一共享质量用于表征视频片段的热度,例如互动率。所述热度可以通过视频片段在多个维度上的特征来确定,例如图像特征、音频特征和文本特征。针对所述至少两个视频片段中的每个视频片段,执行以下操作,以确定该视频片段对应的第一共享质量:In some embodiments, the video can be segmented according to the time window to obtain at least two video clips corresponding to the video; the first sharing quality is used to characterize the popularity of the video clips, such as the interaction rate. The popularity can be determined by the characteristics of the video clip in multiple dimensions, such as image characteristics, audio characteristics and text characteristics. For each video clip in the at least two video clips, perform the following operations to determine the first sharing quality corresponding to the video clip:
从视频片段中获取K个视频帧,以及K个视频帧分别对应的音频帧;K为正整数;Obtain K video frames from the video clip and the audio frames corresponding to the K video frames; K is a positive integer;
对所述K个视频帧分别对应的视频特征进行融合,得到该视频片段的视频特征;Fusion of video features corresponding to the K video frames to obtain video features of the video clip;
对所述K个音频帧分别对应的音频特征进行融合,得到该视频片段的音频特征;Fusion of the audio features corresponding to the K audio frames to obtain the audio features of the video clip;
根据所述视频片段的音频识别文本、视频描述文本以及对象评论文本,得到所述视频片段对应的文本特征;Obtain the text features corresponding to the video clip according to the audio recognition text, video description text and object comment text of the video clip;
对所述视频片段的视频特征、音频特征以及所述文本特征进行融合,得到所述视频片段对应的多维度融合特征;Fusion of video features, audio features and text features of the video clip to obtain multi-dimensional fusion features corresponding to the video clip;
根据所述多维度融合特征,确定所述视频片段对应的第一共享质量。According to the multi-dimensional fusion features, the first sharing quality corresponding to the video clip is determined.
其中,可以对视频片段进行音频识别处理,得到音频识别文本,例如通过ASR识别得到的语音对白文本;可以对视频片段进行文字识别处理,例如OCR处理,得到视频描述文本(例如,字幕文本);可以获取视频片段对应的弹幕文本,作为对象评论文本。。Among them, the video clips can be subjected to audio recognition processing to obtain audio recognition text, such as voice dialogue text obtained through ASR recognition; the video clips can be subjected to text recognition processing, such as OCR processing, to obtain video description text (for example, subtitle text); You can get the barrage text corresponding to the video clip as the object comment text. .
其中,生成视频片段对应的多维度融合特征的具体过程可以包括:获取视频识别模型;视频识别模型包括第一视频识别子模型;第一视频识别子模型包括视频融合网络层、音频融合网络层、文本融合网络层以及多维度融合网络层;将K个视频帧分别输入至视频融合网络层,通过视频融合网络层,对K个视频帧分别进行特征提取,得到K个视频帧分别对应的待融合视频特征,对K个待融合视频特征进行特征融合,得到视频片段Ab对应的视频特征;将K个音频帧分别输入至音频融合网络层,通过音频融合网络层,对K个音频帧分别进行特征提取,得到K个音频帧分别对应的待融合音频特征,对K个待融合音频特征 进行特征融合,得到视频片段对应的音频特征;将所述音频识别文本、所述视频描述文本以及所述对象评论文本,确定为所述视频片段对应的内容文本,将内容文本输入至文本融合网络层,通过文本融合网络层,提取内容文本中的关键文本,对关键文本进行特征提取,得到关键文本对应的文本特征;将视频特征、音频特征以及文本特征分别输入至多维度融合网络层,通过多维度融合网络层,对视频特征、音频特征以及文本特征进行特征融合,得到视频片段对应的多维度融合特征。The specific process of generating multi-dimensional fusion features corresponding to video clips may include: obtaining a video recognition model; the video recognition model includes a first video recognition sub-model; the first video recognition sub-model includes a video fusion network layer, an audio fusion network layer, Text fusion network layer and multi-dimensional fusion network layer; K video frames are input to the video fusion network layer respectively, through the video fusion network layer, feature extraction is performed on the K video frames respectively, and the corresponding features to be fused of the K video frames are obtained. Video features, perform feature fusion on the K video features to be fused, and obtain the video features corresponding to the video segments A b ; input the K audio frames to the audio fusion network layer respectively, and perform the feature fusion on the K audio frames through the audio fusion network layer. Feature extraction, obtain the audio features to be fused corresponding to the K audio frames, and for the K audio features to be fused Perform feature fusion to obtain the audio features corresponding to the video clip; determine the audio recognition text, the video description text and the object comment text as the content text corresponding to the video clip, and input the content text into the text fusion network layer, through the text fusion network layer, extract the key text in the content text, perform feature extraction on the key text, and obtain the text features corresponding to the key text; input the video features, audio features, and text features to the multi-dimensional fusion network layer respectively, and through the multi-dimensional fusion network layer The dimensional fusion network layer performs feature fusion on video features, audio features and text features to obtain multi-dimensional fusion features corresponding to video clips.
其中,第一视频识别子模型进一步包括第一全连接网络层。根据所述多维度融合特征,确定至少两个视频片段分别对应的第一共享质量的具体过程可以包括:针对每个视频片段:将视频片段对应的多维度融合特征输入至第一全连接网络层,通过第一全连接网络层,对视频片段对应的多维度融合特征进行特征变换,得到视频片段对应的第一共享质量。Wherein, the first video recognition sub-model further includes a first fully connected network layer. According to the multi-dimensional fusion feature, the specific process of determining the first shared quality corresponding to at least two video clips may include: for each video clip: input the multi-dimensional fusion feature corresponding to the video clip into the first fully connected network layer , through the first fully connected network layer, feature transformation is performed on the multi-dimensional fusion features corresponding to the video clips, and the first shared quality corresponding to the video clips is obtained.
从至少两个视频片段中选择至少一个视频片段作为候选视频片段的具体过程可以包括:将至少两个视频片段中,第一共享质量等于或大于第一共享质量阈值的视频片段,确定为候选视频片段。The specific process of selecting at least one video segment from at least two video segments as a candidate video segment may include: determining, among the at least two video segments, the video segment whose first shared quality is equal to or greater than the first shared quality threshold as the candidate video. fragment.
业务服务器可以通过时间窗口对视频进行切分,得到视频的至少两个视频片段,其中,时间窗口可以根据实际应用场景进行设定。可以理解的是,业务服务器确定每个视频片段分别对应的第一共享质量的过程是一致的,故本申请实施例以确定视频片段A1对应的第一共享质量为例进行叙述,至少两个视频片段中的剩余视频片段对应的第一共享质量的确定过程,请参见如下描述。请一并参加图4,图4是本申请实施例提供的一种第一视频识别子模型的模型结构示意图。如图4所示,业务服务器从视频片段A1中获取K个视频帧,以及K个视频帧分别对应的音频帧,其中,K个视频帧可以是随机抽取的,也可以是按照周期(例如一秒一帧)抽取的,本申请实施例不对获取视频帧的方式进行限定,可以根据实际应用场景进行设定;业务服务器对视频片段A1进行音频识别处理,例如通过ASR技术,得到音频识别文本;例如通过OCR技术,提取视频片段A1中的视频描述文本,提取对象评论文本,其中,视频描述文本可以包括字幕文本,对象评论文本可以包括弹幕文本;进一步,业务服务器将音频识别文本、视频描述文本以及对象评论文本,确定为视频片段A1对应的内容文本E1The business server can segment the video through the time window to obtain at least two video clips of the video, where the time window can be set according to the actual application scenario. It can be understood that the process of the service server determining the first sharing quality corresponding to each video clip is consistent. Therefore, the embodiment of the present application takes determining the first sharing quality corresponding to video clip A 1 as an example for description. At least two For the process of determining the first sharing quality corresponding to the remaining video clips in the video clip, please refer to the following description. Please also refer to Figure 4. Figure 4 is a schematic model structure diagram of a first video recognition sub-model provided by an embodiment of the present application. As shown in Figure 4, the service server obtains K video frames from the video segment A 1 and the audio frames corresponding to the K video frames. The K video frames can be randomly selected or based on cycles (for example, (one frame per second). The embodiment of the present application does not limit the method of obtaining video frames, and can set it according to the actual application scenario; the business server performs audio recognition processing on video clip A1 , for example, through ASR technology, to obtain audio recognition Text; for example, through OCR technology, extract the video description text in the video clip A1 and extract the object comment text, where the video description text may include subtitle text and the object comment text may include barrage text; further, the business server will recognize the text with the audio , video description text and object comment text, determined to be the content text E 1 corresponding to the video clip A 1 .
请再参见图4,业务服务器获取视频识别模型中的第一视频识别子模型,第一视频识别子模型包括视频融合网络层40a、音频融合网络层40b、文本融合网络层40c、多维度融合网络层40e以及第一全连接网络层40f。业务服务器将K个视频帧分别输入至视频融合网络层40a,假设K个视频帧包括第一视频帧以及第二视频帧,则通过视频融合网络层40a,对第一视频帧进行特征提取,可以得到第一视频帧对应的第一待融合视频特征,对第二视 频帧进行特征提取,可以得到第二视频帧对应的第二待融合视频特征,故业务服务器可以得到K个视频帧分别对应的待融合视频特征;对K个待融合视频特征401a进行特征融合,业务服务器可以得到视频片段A1对应的视频特征401d。可以理解的是,该视频融合网络层40a可以视为用于提取K个视频帧的深度特征的网络,本申请实施例不对视频融合网络层40a的网络类型进行限定,可以由任意一种或多种神经网络组成,如卷积神经网络(Convolutional Neural Networks,CNN)、残差网络(Residual Network,ResNet)、高分辨率网络(High-Resolution Net,HRNet)、标准化的卷积网络扩展(EfficientNet)等。Please refer to Figure 4 again. The business server obtains the first video recognition sub-model in the video recognition model. The first video recognition sub-model includes a video fusion network layer 40a, an audio fusion network layer 40b, a text fusion network layer 40c, and a multi-dimensional fusion network. layer 40e and the first fully connected network layer 40f. The service server inputs the K video frames to the video fusion network layer 40a respectively. Assuming that the K video frames include the first video frame and the second video frame, then through the video fusion network layer 40a, feature extraction is performed on the first video frame. Obtain the first video feature to be fused corresponding to the first video frame, and compare the second video By performing feature extraction on the frequency frame, the second video feature to be fused corresponding to the second video frame can be obtained, so the business server can obtain the video features to be fused corresponding to the K video frames respectively; perform feature fusion on the K video features 401a to be fused, The service server can obtain the video feature 401d corresponding to the video clip A1 . It can be understood that the video fusion network layer 40a can be regarded as a network for extracting deep features of K video frames. The embodiment of the present application does not limit the network type of the video fusion network layer 40a, and it can be any one or more It consists of various neural networks, such as Convolutional Neural Networks (CNN), Residual Network (ResNet), High-Resolution Net (HRNet), and Standardized Convolutional Network Extension (EfficientNet) wait.
此外,业务服务器将K个音频帧分别输入至音频融合网络层40b,假设K个音频帧包括第一视频帧对应的第一音频帧,以及第二视频帧对应的第二音频帧,则通过音频融合网络层40b,对第一音频帧进行特征提取,可以得到第一音频帧对应的第一待融合音频特征,对第二音频帧进行特征提取,可以得到第二音频帧对应的第二待融合音频特征,以此,业务服务器可以得到K个音频帧分别对应的待融合音频特征,对K个待融合音频特征401b进行特征融合,得到视频片段A1对应的音频特征402d。可以理解的是,该音频融合网络层40b可以视为用于提取K个音频帧的深度特征的网络,本申请实施例不对音频融合网络层40b的网络类型进行限定,可以由任意一种或多种神经网络组成,如卷积-时域音频分离网络(Conv-TasNet)、双向长短期记忆网络以及时域音频分离网络(BiLSTM-TasNet)、基于tensorflow的Visual Geometry Group Network模型(VGGish)等。In addition, the service server inputs K audio frames to the audio fusion network layer 40b respectively. Assuming that the K audio frames include the first audio frame corresponding to the first video frame and the second audio frame corresponding to the second video frame, then through the audio The fusion network layer 40b performs feature extraction on the first audio frame to obtain the first audio feature to be fused corresponding to the first audio frame, and performs feature extraction on the second audio frame to obtain the second audio feature to be fused corresponding to the second audio frame. Audio features, with this, the business server can obtain the audio features to be fused corresponding to the K audio frames, perform feature fusion on the K audio features 401b to be fused, and obtain the audio features 402d corresponding to the video segment A 1 . It can be understood that the audio fusion network layer 40b can be regarded as a network used to extract deep features of K audio frames. The embodiment of the present application does not limit the network type of the audio fusion network layer 40b. It can be any one or more It consists of several kinds of neural networks, such as convolution-time domain audio separation network (Conv-TasNet), bidirectional long short-term memory network and time domain audio separation network (BiLSTM-TasNet), Visual Geometry Group Network model (VGGish) based on tensorflow, etc.
业务服务器将内容文本E1输入至文本融合网络层40c,通过文本融合网络层40c,提取内容文本E1中的关键文本,对关键文本进行特征提取,得到关键文本对应的文本特征。本申请实施例不对文本融合网络层40c的网络类型进行限定,可以为任意一种自然语言处理网络,例如深度自注意力变换网络(Transformer,一种广泛应用于自然语言翻译和图像处理领域的深度学习模型),Word2Vec(用来产生词向量的模型),双向编码模型(Bidirectional Encoder Representation from Transformers,Bert)等。The business server inputs the content text E 1 into the text fusion network layer 40c, extracts the key text in the content text E 1 through the text fusion network layer 40c, performs feature extraction on the key text, and obtains text features corresponding to the key text. The embodiment of the present application does not limit the network type of the text fusion network layer 40c. It can be any natural language processing network, such as a deep self-attention transform network (Transformer, a deep self-attention transform network widely used in the fields of natural language translation and image processing). Learning model), Word2Vec (model used to generate word vectors), Bidirectional Encoder Representation from Transformers, Bert), etc.
进一步,业务服务器将视频特征401d、音频特征402d以及文本特征403d分别输入至多维度融合网络层40e,通过多维度融合网络层40e,对视频特征401d、音频特征402d以及文本特征403d进行特征融合,可以得到视频片段A1对应的多维度融合特征401e。业务服务器将多维度融合特征401e输入至第一全连接网络层40f,通过第一全连接网络层40f,对多维度融合特征401e进行特征变换,得到视频片段A1对应的第一共享质量。其中,根据第一共享质量,业务服务器从至少两个视频片段中确定候选视频片段的具体过程,请参见上文图2中的描述,此次不进行赘述。Further, the business server inputs the video feature 401d, the audio feature 402d, and the text feature 403d to the multi-dimensional fusion network layer 40e respectively. Through the multi-dimensional fusion network layer 40e, the video feature 401d, the audio feature 402d, and the text feature 403d are feature fused. The multi-dimensional fusion feature 401e corresponding to the video clip A 1 is obtained. The business server inputs the multi-dimensional fusion feature 401e to the first fully connected network layer 40f, and performs feature transformation on the multi-dimensional fused feature 401e through the first fully connected network layer 40f to obtain the first shared quality corresponding to the video segment A1 . Among them, according to the first sharing quality, the service server determines the specific process of candidate video clips from at least two video clips. Please refer to the description in Figure 2 above, which will not be described again this time.
步骤S102,获取与视频相关联的对象标签文本序列,根据对象标签文本序列以及候 选视频片段,确定各候选视频片段对应的第二共享质量,根据各候选视频片段对应的第二共享质量,从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段。Step S102, obtain the object label text sequence associated with the video, and Select video segments, determine the second sharing quality corresponding to each candidate video segment, and select at least one candidate video segment from the candidate video segments as the candidate shared video segment according to the second sharing quality corresponding to each candidate video segment.
具体的,所述第二共享质量用于表征所述候选视频片段与所述共享对象的对象标签文本的相关性。所述对象标签文本序列包括分享所述视频的浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣。获取与视频相关联的浏览对象的对象标签文本,获取与浏览对象相关联的共享对象的对象标签文本;根据浏览对象的对象标签文本以及共享对象的对象标签文本,生成对象标签文本序列;获取视频识别模型,将对象标签文本序列以及候选视频片段分别输入至视频识别模型;视频识别模型包括第二视频识别子模型;第二视频识别子模型包括第一文本编码网络层;通过第一文本编码网络层,对对象标签文本序列中的每个对象标签文本进行文本编码,得到对象标签文本序列对应的第一对象标签特征;获取各候选视频片段对应的多维度融合特征,根据第一对象标签特征以及各候选视频片段对应的多维度融合特征,分别确定各候选视频片段对应的第二共享质量。Specifically, the second sharing quality is used to characterize the correlation between the candidate video clip and the object label text of the shared object. The object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; the object tag text of the browsing object is used to represent the interest of the browsing object, and the sharing The object tag text of an object is used to characterize the interest of the shared object. Obtain the object tag text of the browsing object associated with the video, and obtain the object tag text of the shared object associated with the browsing object; generate an object tag text sequence based on the object tag text of the browsing object and the object tag text of the shared object; obtain the video The recognition model inputs the object label text sequence and the candidate video clips to the video recognition model respectively; the video recognition model includes a second video recognition sub-model; the second video recognition sub-model includes a first text encoding network layer; through the first text encoding network layer, perform text encoding on each object label text in the object label text sequence, and obtain the first object label feature corresponding to the object label text sequence; obtain the multi-dimensional fusion features corresponding to each candidate video clip, and based on the first object label feature and The multi-dimensional fusion features corresponding to each candidate video clip determine the second shared quality corresponding to each candidate video clip.
其中,第二视频识别子模型还包括第一拼接网络层以及第二全连接网络层;确定候选视频片段对应的第二共享质量的具体过程可以包括:针对每个候选视频片段,将第一对象标签特征以及该候选视频片段对应的多维度融合特征分别输入至第一拼接网络层;通过第一拼接网络层,对第一对象标签特征以及该候选视频片段对应的多维度融合特征进行特征拼接,得到该候选视频片段对应的第一多维度拼接特征;将第一多维度拼接特征输入至第二全连接网络层,通过第二全连接网络层,对第一多维度拼接特征进行特征变换,得到该候选视频片段对应的第二共享质量;Wherein, the second video recognition sub-model also includes a first splicing network layer and a second fully connected network layer; the specific process of determining the second sharing quality corresponding to the candidate video clip may include: for each candidate video clip, the first object The label features and the multi-dimensional fusion features corresponding to the candidate video clips are respectively input to the first splicing network layer; through the first splicing network layer, feature splicing is performed on the first object label features and the multi-dimensional fusion features corresponding to the candidate video clips, Obtain the first multi-dimensional splicing feature corresponding to the candidate video clip; input the first multi-dimensional splicing feature to the second fully connected network layer, perform feature transformation on the first multi-dimensional splicing feature through the second fully connected network layer, and obtain The second shared quality corresponding to the candidate video clip;
其中,候选视频片段的数量为至少两个;从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段的具体过程可以包括:将至少两个候选视频片段中,第二共享质量大于第二共享质量阈值的候选视频片段,确定为候选共享视频片段。Wherein, the number of candidate video segments is at least two; the specific process of selecting at least one candidate video segment from the candidate video segments as the candidate shared video segment may include: among the at least two candidate video segments, the second shared quality is greater than the second Candidate video clips with a shared quality threshold are determined as candidate shared video clips.
步骤S101构建出互动率高、共享价值高的候选视频片段,本步骤对候选视频片段限定于对象兴趣的相关性,使得构建的候选视频片段与对象的兴趣更加契合,可以进一步提升视频共享的播放转化。业务服务器获取浏览对象的对象标签文本(简写为浏览对象标签文本),浏览对象标签文本可以表征浏览对象的兴趣,例如标签文本(猫、动漫、宠物),则表征浏览对象对猫、动漫以及宠物类型的视频感兴趣;同样地,获取共享对象的对象标签文本(简写为共享对象标签文本),共享对象标签文本可以表征共享对象的兴趣,例如标签文本(猫、动画片、少儿),则表征共享对象对猫、动画片以及少儿类型的视频感兴 趣,进一步,结合浏览对象标签文本以及共享对象标签文本,业务服务器得到对象标签文本序列,例如将标签文本(猫、动漫、宠物)以及标签文本(猫、动画片、少儿)组合,得到标签文本序列(猫、动漫、宠物、动画片、少儿)。如果在构建对象标签文本序列时,只能获取到一个对象(例如浏览对象,或共享对象)的对象标签文本,则以获取到的对象标签文本生成对象标签文本序列。Step S101 constructs candidate video clips with high interaction rate and high sharing value. This step limits the candidate video clips to the relevance of the subject's interests, so that the constructed candidate video clips are more consistent with the subject's interests, which can further improve the playback of video sharing. Transformation. The business server obtains the object tag text of the browsing object (abbreviated as browsing object tag text). The browsing object tag text can represent the browsing object's interest. For example, the tag text (cat, animation, pet) represents the browsing object's interest in cats, animation, and pets. type of video that interests you; similarly, obtain the object tag text of the shared object (abbreviated as shared object tag text). The shared object tag text can represent the interest of the shared object. For example, the tag text (cat, cartoon, children) represents Shared with people interested in cats, cartoons, and children's videos Interestingly, further, by combining the browsed object tag text and the shared object tag text, the business server obtains the object tag text sequence, for example, combining the tag text (cat, animation, pet) and tag text (cat, cartoon, children) to obtain the tag text Sequences (cats, anime, pets, cartoons, children). If when constructing the object label text sequence, the object label text of only one object (such as a browsing object or a shared object) can be obtained, the object label text sequence is generated using the obtained object label text.
本申请实施例可以提供两种获取候选视频片段对应的多维度融合特征的方式,第一种方式:步骤S101已提供至少两个视频片段分别对应的多维度融合特征(包括图4中的多维度融合特征401e),且候选视频片段属于至少两个视频片段,故业务服务器可以在第一视频识别子模型中所输出的至少两个视频片段分别对应的多维度融合特征中,获取候选视频片段对应的多维度融合特征。请再参见图2,业务服务器可以通过第一视频识别子模型20e分别获取视频片段201d对应的多维度融合特征、视频片段202d对应的多维度融合特征、视频片段203d对应的多维度融合特征、视频片段204d对应的多维度融合特征、视频片段205d对应的多维度融合特征、视频片段206d对应的多维度融合特征,且业务服务器确定视频片段201d、视频片段202d、视频片段203d、视频片段206d,为候选视频片段,则可以直接将第一视频识别子模型20e所输出的针对视频片段201d的多维度融合特征、针对视频片段202d的多维度融合特征、针对视频片段203d的多维度融合特征、针对视频片段206d的多维度融合特征,确定为候选视频片段对应的多维度融合特征。上述第一种获取候选视频片段对应的多维度融合特征的方式,可以减小视频识别模型的运算时间以及运算成本。Embodiments of the present application can provide two ways of obtaining multi-dimensional fusion features corresponding to candidate video clips. The first way: Step S101 has provided multi-dimensional fusion features corresponding to at least two video clips (including the multi-dimensional fusion features in Figure 4 Fusion features 401e), and the candidate video clips belong to at least two video clips, so the business server can obtain the candidate video clips corresponding to the multi-dimensional fusion features corresponding to the at least two video clips output in the first video recognition sub-model. multi-dimensional fusion features. Please refer to Figure 2 again. The business server can respectively obtain the multi-dimensional fusion features corresponding to the video clip 201d, the multi-dimensional fusion features corresponding to the video clip 202d, the multi-dimensional fusion features corresponding to the video clip 203d, and the video through the first video recognition sub-model 20e. The multi-dimensional fusion feature corresponding to segment 204d, the multi-dimensional fusion feature corresponding to video segment 205d, and the multi-dimensional fusion feature corresponding to video segment 206d, and the business server determines that video segment 201d, video segment 202d, video segment 203d, and video segment 206d are candidate video clips, you can directly combine the multi-dimensional fusion features for the video clip 201d, the multi-dimensional fusion features for the video clip 202d, the multi-dimensional fusion features for the video clip 203d, and the multi-dimensional fusion features for the video clip output by the first video recognition sub-model 20e. The multi-dimensional fusion feature of segment 206d is determined as the multi-dimensional fusion feature corresponding to the candidate video segment. The above-mentioned first method of obtaining multi-dimensional fusion features corresponding to candidate video clips can reduce the calculation time and cost of the video recognition model.
为了提高候选视频片段对应的多维度融合特征的精度,业务服务器可以采用第二种方式,请一并参见图5,图5是本申请实施例提供的一种第二视频识别子模型的模型结构示意图。如图5所示的虚线区域中的模型结构与图4的第一视频识别子模型中的模型结构相同,但两者之间的模型参数不一致,因为训练第二视频识别子模型时,业务服务器是将已训练好的第一视频识别子模型中的视频融合网络层40a、音频融合网络层40b、文本融合网络层40c以及多维度融合网络层40e分别对应的模型参数,作为图5的虚线区域中的初始化模型参数,并基于第二训练样本集(包括多个样本视频、对象标签样本文本序列以及各样本视频对应的第二质量标签)对初始化模型参数进行微调。可以理解的是,业务服务器通过图5的虚线区域得到候选视频片段对应的多维度融合特征402e的过程,与通过第一视频识别子模型得到至少两个视频片段分别对应的多维度融合特征的过程是一致的,故请参见上文步骤S101的描述,此处不进行赘述。由于图5的虚线区域中的模型参数优于图4中的模型参数,故多维度融合特征402e优于步骤S101中的至少两个多维度融合特征。 In order to improve the accuracy of the multi-dimensional fusion features corresponding to the candidate video clips, the business server can adopt the second method. Please also refer to Figure 5. Figure 5 is a model structure of a second video recognition sub-model provided by an embodiment of the present application. Schematic diagram. As shown in Figure 5, the model structure in the dotted area is the same as the model structure in the first video recognition sub-model of Figure 4, but the model parameters between the two are inconsistent because when training the second video recognition sub-model, the business server The model parameters corresponding to the video fusion network layer 40a, audio fusion network layer 40b, text fusion network layer 40c and multi-dimensional fusion network layer 40e in the trained first video recognition sub-model are used as the dotted line area in Figure 5 Initialize model parameters in , and fine-tune the initialized model parameters based on the second training sample set (including multiple sample videos, object label sample text sequences, and second quality labels corresponding to each sample video). It can be understood that the process in which the service server obtains the multi-dimensional fusion features 402e corresponding to the candidate video clips through the dotted line area in Figure 5 is different from the process in which the multi-dimensional fusion features corresponding to at least two video clips are obtained through the first video recognition sub-model. are consistent, so please refer to the description of step S101 above, which will not be described again here. Since the model parameters in the dotted line area of Figure 5 are better than the model parameters in Figure 4, the multi-dimensional fusion feature 402e is better than at least two multi-dimensional fusion features in step S101.
本申请实施例通过图5,同时对对象的个性化兴趣以及视频片段内容进行联合建模,如图5所示,第二视频识别子模型可以包括第一文本编码网络层40g、第一拼接网络层40h以及第二全连接网络层40i。通过第一文本编码网络层40h,业务服务器对对象标签文本序列中的每个对象标签文本进行文本编码,得到对象标签文本序列对应的第一对象标签特征401g;业务服务器将第一对象标签特征401g以及候选视频片段对应的多维度融合特征(例如图5中的多维度融合特征402e)分别输入至第一拼接网络层40h,通过第一拼接网络层40h,对第一对象标签特征401g以及多维度融合特征402e进行特征拼接,可以得到候选视频片段对应的第一多维度拼接特征401h;进一步,业务服务器将第一多维度拼接特征401h输入至第二全连接网络层40i,通过第二全连接网络层40i,可以得到候选视频片段对应的第二共享质量。Through Figure 5, the embodiment of this application jointly models the personalized interests of the object and the content of the video clips at the same time. As shown in Figure 5, the second video recognition sub-model may include the first text encoding network layer 40g, the first splicing network layer 40h and a second fully connected network layer 40i. Through the first text encoding network layer 40h, the business server performs text encoding on each object label text in the object label text sequence to obtain the first object label feature 401g corresponding to the object label text sequence; the business server converts the first object label feature 401g and the multi-dimensional fusion features corresponding to the candidate video clips (such as the multi-dimensional fusion features 402e in Figure 5) are respectively input to the first splicing network layer 40h. Through the first splicing network layer 40h, the first object label feature 401g and the multi-dimensional The fusion feature 402e performs feature splicing to obtain the first multi-dimensional splicing feature 401h corresponding to the candidate video clip; further, the business server inputs the first multi-dimensional splicing feature 401h to the second fully connected network layer 40i, through the second fully connected network In layer 40i, the second shared quality corresponding to the candidate video clip can be obtained.
本申请实施例不对第一文本编码网络层40g的网络类型进行限定,可以为任意一种自然语言处理网络。The embodiment of the present application does not limit the network type of the first text encoding network layer 40g, and it can be any natural language processing network.
其中,业务服务器根据候选视频片段对应的第二共享质量,从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段的过程,请参见上文图2中的描述,此处不进行赘述。The process in which the service server selects at least one candidate video segment from the candidate video segments as a candidate shared video segment according to the second sharing quality corresponding to the candidate video segment may be referred to the description in Figure 2 above, and will not be described again here.
步骤S103,根据对象标签文本序列以及候选共享视频片段,确定各候选共享视频片段对应的第三共享质量,以及各候选共享视频片段对应的辅助描述信息。Step S103: Determine the third sharing quality corresponding to each candidate shared video clip and the auxiliary description information corresponding to each candidate shared video clip based on the object label text sequence and the candidate shared video clips.
具体的,辅助描述信息是指用于辅助视频片段的描述信息,包括但不限于以下一种模态信息或多种模态信息的组成:视频片段的文案(文本模态)、封面(图像模态)、语音介绍(音频模态)等,具体可以根据实际应用场景进行设定。所述第三共享质量用于表征所述辅助描述信息与所述视频片段以及共享对象的对象标签文本的匹配度。Specifically, the auxiliary description information refers to the description information used to assist the video clip, including but not limited to the following modal information or a combination of multiple modal information: the copy (text mode), cover (image mode) of the video clip. mode), voice introduction (audio mode), etc., which can be set according to the actual application scenario. The third sharing quality is used to characterize the matching degree of the auxiliary description information with the video clip and the object tag text of the shared object.
业务服务器通过视频识别模型中的第三视频识别子模型,确定候选共享视频片段对应的第三共享质量,进而确定辅助描述信息的过程,请参见上文图2中的描述。若辅助描述信息包括文案,则上述的第三视频识别子模型包括第四视频识别子模型;若辅助描述信息包括封面,则上述的第三视频识别子模型包括第五视频识别子模型;若辅助描述信息包括文案以及封面,则第三视频识别子模型可以包括第四视频识别子模型以及第五视频识别子模型。其中,针对第四视频识别子模型以及第五视频识别子模型的相关描述,请参见下文图6所对应的实施例中的描述,此处暂不展开叙述。The service server determines the third sharing quality corresponding to the candidate shared video clip through the third video recognition sub-model in the video recognition model, and then determines the auxiliary description information. Please refer to the description in Figure 2 above. If the auxiliary description information includes copywriting, the above-mentioned third video recognition sub-model includes a fourth video recognition sub-model; if the auxiliary description information includes a cover, the above-mentioned third video recognition sub-model includes a fifth video recognition sub-model; if the auxiliary The description information includes copy and cover, then the third video recognition sub-model may include a fourth video recognition sub-model and a fifth video recognition sub-model. For related descriptions of the fourth video recognition sub-model and the fifth video recognition sub-model, please refer to the description in the embodiment corresponding to FIG. 6 below, which will not be described here.
步骤S104,根据每个候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从候选共享视频片段中确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。 Step S104: Determine the shared video segments from the candidate shared video segments based on the first sharing quality, the second sharing quality, and the third sharing quality corresponding to each candidate shared video segment, and combine the shared video segments and the auxiliary information corresponding to the shared video segments. Description information, identified as shared data to be sent to the shared object.
具体的,针对每一个候选共享视频片段,将该候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量进行加权求和,得到该候选共享视频片段对应的总共享质量;将候选共享视频片段中,总共享质量最大的候选共享视频片段确定为共享视频片段;在至少两个候选共享视频片段分别对应的辅助描述信息中,获取共享视频片段对应的辅助描述信息。Specifically, for each candidate shared video segment, the first shared quality, the second shared quality, and the third shared quality corresponding to the candidate shared video segment are weighted and summed to obtain the total shared quality corresponding to the candidate shared video segment. ; Determine the candidate shared video clip with the largest total sharing quality among the candidate shared video clips as the shared video clip; obtain the auxiliary description information corresponding to the shared video clip from the auxiliary description information corresponding to at least two candidate shared video clips.
本申请实施例提出一种视频智能共享实现方法,通过对视频内容多维度进行深度理解,并且结合弹幕等互动数据,该方法可以自动挖掘出视频中的多个具备高共享价值的视频片段,基于对对象的兴趣挖掘,选择出与对象个性化兴趣更加符合的高价值共享片段,并可以生成相应的个性化共享封面图与共享文案,使视频共享更加智能,在能更加直观地展现出更有价值的视频精彩内容的同时,该方法可以与对象个性化更加吻合,故可以进一步提升视频共享效果。在提升视频共享效果的前提下,仅针对视频片段而不是整个视频进行分享,节省了网络传输资源以及共享数据的接收设备的处理资源。The embodiment of this application proposes a method for realizing intelligent sharing of videos. By deeply understanding the multi-dimensional content of the video and combining it with interactive data such as barrages, this method can automatically dig out multiple video clips with high sharing value in the video. Based on the mining of the object's interests, high-value shared clips that are more consistent with the object's personalized interests can be selected, and corresponding personalized shared cover images and shared copywriting can be generated, making video sharing more intelligent and showing more intuitively. While providing valuable video content, this method can be more consistent with object personalization, so it can further improve the video sharing effect. On the premise of improving the video sharing effect, only video clips are shared instead of the entire video, which saves network transmission resources and processing resources of the receiving device of the shared data.
请参见图6,图6是本申请实施例提供的一种数据处理方法的另一流程示意图。该方法可以由业务服务器(例如,上述图1所示的业务服务器100)执行,也可以由终端设备(例如,上述图1所示的终端设备200a)执行,还可以由业务服务器和终端设备交互执行。为便于理解,本申请实施例以该方法由业务服务器执行为例进行说明。如图6所示,该方法至少可以包括以下步骤。Please refer to FIG. 6 , which is another schematic flowchart of a data processing method provided by an embodiment of the present application. This method may be executed by a business server (for example, the business server 100 shown in Figure 1 above), or by a terminal device (for example, the terminal device 200a shown in Figure 1 above), or by interaction between the business server and the terminal device. implement. For ease of understanding, the embodiment of this application takes the method being executed by the service server as an example for description. As shown in Figure 6, the method may include at least the following steps.
步骤S201,获取视频中的至少两个视频片段,确定至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从至少两个视频片段中选择至少一个视频片段作为候选视频片段。Step S201: Obtain at least two video segments in the video, determine the first sharing quality corresponding to the at least two video segments, and select at least one video segment from the at least two video segments as a candidate video based on the first sharing quality. fragment.
步骤S202,获取与视频相关联的对象标签文本序列,根据对象标签文本序列以及候选视频片段,确定各候选视频片段对应的第二共享质量,根据各候选视频片段对应的第二共享质量,从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段。Step S202: Obtain the object label text sequence associated with the video, determine the second sharing quality corresponding to each candidate video clip according to the object label text sequence and the candidate video clips, and determine the second sharing quality corresponding to each candidate video clip from the candidate video clip. At least one candidate video segment is selected from the video segments as a candidate shared video segment.
其中,步骤S201-步骤S202的具体实现过程,请参见上文图3所对应的实施例中的步骤S101-步骤S102,此处不进行赘述。For the specific implementation process of step S201 to step S202, please refer to step S101 to step S102 in the embodiment corresponding to FIG. 3 above, which will not be described again here.
在一些实施例中,所述候选共享视频片段对应的辅助描述信息包括所述候选共享视频片段对应的描述图像,以及所述候选共享视频片段对应的描述文本;所述候选共享视频片段对应的第三共享质量包括所述描述图像对应的图像共享质量,以及所述描述文本对应的文本共享质量。In some embodiments, the auxiliary description information corresponding to the candidate shared video clip includes a description image corresponding to the candidate shared video clip, and a description text corresponding to the candidate shared video clip; the candidate shared video clip corresponds to the third The three sharing qualities include the image sharing quality corresponding to the description image, and the text sharing quality corresponding to the description text.
针对步骤S201中确定的每个候选共享视频片段,执行下面的步骤S203至S206,以确定各候选共享视频片段的第三共享质量以及辅助描述信息。 For each candidate shared video segment determined in step S201, the following steps S203 to S206 are performed to determine the third sharing quality and auxiliary description information of each candidate shared video segment.
步骤S203,获取候选共享视频片段中的至少两个视频帧,确定所述至少两个视频针中每个视频帧对应的图像共享质量。Step S203: Obtain at least two video frames in the candidate shared video clips, and determine the image sharing quality corresponding to each video frame in the at least two video needles.
具体的,根据图像采样周期,对候选共享视频片段进行图像采样,得到候选共享视频片段中的至少两个视频帧;针对每个视频帧,将该视频帧输入至视频识别模型,通过所述视频识别模型的图像识别网络层,对该视频帧分别进行特征提取,得到该视频帧对应的共享图像特征;其中,所述视频识别模型包括第四视频识别子模型;第四视频识别子模型包括图像识别网络层和第二拼接网络层。Specifically, image sampling is performed on the candidate shared video clips according to the image sampling period, and at least two video frames in the candidate shared video clips are obtained; for each video frame, the video frame is input to the video recognition model, and through the video The image recognition network layer of the recognition model performs feature extraction on the video frames respectively to obtain the shared image features corresponding to the video frames; wherein the video recognition model includes a fourth video recognition sub-model; the fourth video recognition sub-model includes an image Identify the network layer and the second concatenated network layer.
业务服务器可以通过图像采样周期(例如每秒采样一张图片)从候选共享视频片段中,获取至少两个视频帧,该至少两个视频帧均作为候选描述图像,业务服务器需要确定至少两个视频帧分别对应的图像共享质量,进而确定候选共享视频片段对应的图像共享质量,请一并参见图7,图7是本申请实施例提供的一种第四视频识别子模型的模型结构示意图。可以理解的是,业务服务器通过第四视频识别子模型得到每个视频帧分别对应的图像共享质量的过程是一致的,故本申请实施例以得到视频帧F1对应的图像共享质量为例进行叙述,至少两个视频帧中的剩余视频帧的处理过程请参见下文的描述。The business server can obtain at least two video frames from the candidate shared video clips through the image sampling cycle (for example, sampling one picture per second), and the at least two video frames are used as candidate description images. The business server needs to determine at least two videos To determine the image sharing quality corresponding to each frame, and then determine the image sharing quality corresponding to the candidate shared video clip, please refer to Figure 7 as well. Figure 7 is a schematic model structure diagram of a fourth video recognition sub-model provided by an embodiment of the present application. It can be understood that the process for the business server to obtain the image sharing quality corresponding to each video frame through the fourth video recognition sub-model is consistent. Therefore, the embodiment of the present application takes obtaining the image sharing quality corresponding to the video frame F1 as an example to describe , please refer to the description below for the processing process of the remaining video frames among at least two video frames.
业务服务器将视频帧F1输入至第四视频识别子模型中的图像识别网络层70a,通过图像识别网络层70a,对视频帧F1进行特征提取,得到视频帧F1对应的共享图像特征701a。The business server inputs the video frame F1 to the image recognition network layer 70a in the fourth video recognition sub-model, and performs feature extraction on the video frame F1 through the image recognition network layer 70a to obtain the shared image feature 701a corresponding to the video frame F1.
获取候选共享视频片段对应的多维度融合特征,获取对象标签文本序列对应的第二对象标签特征;将视频帧F1对应的共享图像特征701a、候选共享视频片段对应的多维度融合特征以及第二对象标签特征分别输入至第二拼接网络层;通过第二拼接网络层,对视频帧F1对应的共享图像特征、候选共享视频片段对应的多维度融合特征以及第二对象标签特征进行特征拼接,得到视频帧F1对应的第二多维度拼接特征;根据视频帧F1对应的第二多维度拼接特征,确定视频帧F1对应的图像共享质量。Obtain the multi-dimensional fusion feature corresponding to the candidate shared video clip, and obtain the second object label feature corresponding to the object label text sequence; combine the shared image feature 701a corresponding to the video frame F1, the multi-dimensional fusion feature corresponding to the candidate shared video clip and the second object The label features are respectively input to the second splicing network layer; through the second splicing network layer, the shared image features corresponding to the video frame F1, the multi-dimensional fusion features corresponding to the candidate shared video segments and the second object label features are spliced to obtain the video The second multi-dimensional splicing feature corresponding to the frame F1; determine the image sharing quality corresponding to the video frame F1 according to the second multi-dimensional splicing feature corresponding to the video frame F1.
第四视频识别子模型还包括第三全连接网络层;针对每个视频帧,将该视频帧对应的第二多维度拼接特征输入至第三全连接网络层,通过第三全连接网络层,对该视频帧对应的第二多维度拼接特征进行特征变换,得到该视频帧对应的图像共享质量。The fourth video recognition sub-model also includes a third fully connected network layer; for each video frame, the second multi-dimensional splicing feature corresponding to the video frame is input to the third fully connected network layer, and through the third fully connected network layer, Perform feature transformation on the second multi-dimensional splicing feature corresponding to the video frame to obtain the image sharing quality corresponding to the video frame.
步骤S204,根据每个视频帧对应的图像共享质量,确定候选共享视频片段对应的图像共享质量,并从所述至少两个视频帧中选择一个视频帧作为候选共享视频片段对应的描述图像。Step S204: Determine the image sharing quality corresponding to the candidate shared video segment based on the image sharing quality corresponding to each video frame, and select one video frame from the at least two video frames as the description image corresponding to the candidate shared video segment.
其中,从至少两个视频帧分别对应的图像共享质量中,获取最大的图像共享质量,将最大的图像共享质量确定为候选共享视频片段对应的图像共享质量;在至少两个视频帧中,将最大的图像共享质量对应的视频帧确定为候选共享视频片段对应的描述图像。 Among them, the maximum image sharing quality is obtained from the image sharing qualities corresponding to at least two video frames, and the maximum image sharing quality is determined as the image sharing quality corresponding to the candidate shared video clip; among at least two video frames, the maximum image sharing quality is determined The video frame corresponding to the maximum image sharing quality is determined as the description image corresponding to the candidate shared video clip.
本申请实施例可以提供3种不同的获取候选共享视频片段对应的多维度融合特征的方式,第一种获取方式可以参见上文图3所对应的实施例中步骤S102关于获取候选视频片段对应的多维度融合特征的描述,两者原理一致;第二种获取方式与第一种获取方式类似,图3的步骤S102已提供候选视频片段对应的多维度融合特征(包括图4中的多维度融合特征402e),且候选共享视频片段属于候选视频片段,故业务服务器可以在第二视频识别子模型中所输出的候选视频片段对应的多维度融合特征中,获取候选共享视频片段对应的多维度融合特征。上述两种获取方式均可以减小视频识别模型的运算时间以及运算成本。The embodiments of this application can provide three different ways of obtaining multi-dimensional fusion features corresponding to candidate shared video clips. For the first obtaining method, please refer to step S102 in the embodiment corresponding to Figure 3 above for obtaining the multi-dimensional fusion features corresponding to candidate video clips. The description of multi-dimensional fusion features has the same principle. The second acquisition method is similar to the first acquisition method. Step S102 in Figure 3 has provided the multi-dimensional fusion features corresponding to the candidate video clips (including the multi-dimensional fusion in Figure 4 Feature 402e), and the candidate shared video clip belongs to the candidate video clip, so the business server can obtain the multi-dimensional fusion corresponding to the candidate shared video clip from the multi-dimensional fusion feature corresponding to the candidate video clip output in the second video recognition sub-model. feature. Both of the above acquisition methods can reduce the computing time and cost of the video recognition model.
为了提高候选共享视频片段对应的多维度融合特征的精度,业务服务器可以采用第三种方式,请一并参见图7,图7是本申请实施例提供的一种第四视频识别子模型的模型结构示意图。如图7所示的虚线区域中的模型结构与图5的第二视频识别子模型中的模型结构相同,但两者之间的模型参数不一致,因为训练第四视频识别子模型时,业务服务器是将已训练好的第二视频识别子模型中的模型参数,作为图7的虚线区域中的初始化模型参数,并基于第三训练样本集(包括多个样本视频、对象标签样本文本序列、各样本视频对应的样本描述图像以及各样本视频对应的描述图像质量标签)对初始化模型参数进行微调。可以理解的是,业务服务器通过图7的虚线区域得到候选共享视频片段对应的多维度融合特征的过程,与通过第二视频识别子模型得到多维度融合特征402e的过程是一致的,故请参见上文步骤S101的描述,此处不进行赘述。由于图7的虚线区域中的模型参数优于图5中的模型参数,故图7所输出的候选共享视频片段对应的多维度融合特征优于图5中的多维度融合特征402e。In order to improve the accuracy of the multi-dimensional fusion features corresponding to the candidate shared video clips, the business server can adopt the third method. Please also refer to Figure 7. Figure 7 is a model of a fourth video recognition sub-model provided by an embodiment of the present application. Schematic. As shown in Figure 7, the model structure in the dotted area is the same as the model structure in the second video recognition sub-model of Figure 5, but the model parameters between the two are inconsistent because when training the fourth video recognition sub-model, the business server The model parameters in the trained second video recognition sub-model are used as the initialization model parameters in the dotted area of Figure 7, and based on the third training sample set (including multiple sample videos, object label sample text sequences, each The sample description image corresponding to the sample video and the description image quality label corresponding to each sample video) are used to fine-tune the initialization model parameters. It can be understood that the process in which the service server obtains the multi-dimensional fusion features corresponding to the candidate shared video clips through the dotted area in Figure 7 is consistent with the process in which the multi-dimensional fusion features 402e are obtained through the second video recognition sub-model, so please refer to The description of step S101 above will not be repeated here. Since the model parameters in the dotted area of Figure 7 are better than the model parameters in Figure 5, the multi-dimensional fusion features corresponding to the candidate shared video clips output in Figure 7 are better than the multi-dimensional fusion features 402e in Figure 5.
同样的原理,本申请实施例可以提供两种获取第二对象标签特征的方式,第一种获取方式:将图5所输出的第一对象标签特征401g确定为第二对象标签特征;第二种获取方式:如图7所示,将对象标签文本序列输入至第四视频识别子模型,其中,业务服务器通过图7的虚线区域得到第二对象标签特征的过程,与通过图5中的第一文本编码网络层40g得到第一对象标签特征401g的过程是一致的,故请参见上文步骤S102的描述,此处不进行赘述。Based on the same principle, embodiments of the present application can provide two ways to obtain the second object label feature. The first acquisition method: determine the first object tag feature 401g output in Figure 5 as the second object tag feature; Obtaining method: As shown in Figure 7, the object label text sequence is input to the fourth video recognition sub-model. The process of the business server obtaining the second object label feature through the dotted area in Figure 7 is the same as the process through the first process in Figure 5. The process of obtaining the first object label feature 401g by the text encoding network layer 40g is the same, so please refer to the description of step S102 above and will not be described again here.
请再参见图7,业务服务器将视频帧F1对应的共享图像特征701a、候选共享视频片段对应的多维度融合特征以及第二对象标签特征分别输入至第二拼接网络层70b;通过第二拼接网络层70b,可以对共享图像特征701a、候选共享视频片段对应的多维度融合特征以及第二对象标签特征进行特征拼接,故可以得到视频帧F1对应的第二多维度拼接特征701b;进一步,业务服务器将第二多维度拼接特征701b输入至第三全连接网络层70c,通过第三全连接网络层70c,可以对第二多维度拼接特征701b进行特征变换,故可以得到视 频帧F1对应的图像共享质量。按照上述描述,业务服务器可以得到至少两个视频帧分别对应的图像共享质量。Please refer to Figure 7 again. The business server inputs the shared image feature 701a corresponding to the video frame F1, the multi-dimensional fusion feature corresponding to the candidate shared video clip and the second object label feature respectively to the second splicing network layer 70b; through the second splicing network Layer 70b can perform feature splicing on the shared image feature 701a, the multi-dimensional fusion feature corresponding to the candidate shared video clip, and the second object label feature, so the second multi-dimensional splicing feature 701b corresponding to the video frame F1 can be obtained; further, the business server The second multi-dimensional splicing feature 701b is input to the third fully connected network layer 70c. Through the third fully connected network layer 70c, feature transformation can be performed on the second multi-dimensional splicing feature 701b, so the visual image can be obtained. The image sharing quality corresponding to frequency frame F1. According to the above description, the service server can obtain the image sharing quality corresponding to at least two video frames.
步骤S205,根据对象标签文本序列以及候选共享视频片段对应的内容文本,确定候选共享视频片段对应的文本共享质量,以及候选共享视频片段对应的描述文本。Step S205: Based on the object tag text sequence and the content text corresponding to the candidate shared video clip, determine the text sharing quality corresponding to the candidate shared video clip and the description text corresponding to the candidate shared video clip.
具体的,描述文本是由N个共享词所组成的;获取视频识别模型;视频识别模型包括第五视频识别子模型;第五视频识别子模型包括第二文本编码网络层、第三文本编码网络层、注意力网络层以及文本解码网络层;将候选共享视频片段对应的内容文本输入至第二文本编码网络层,通过第二文本编码网络层,对候选共享视频片段对应的内容文本进行文本编码,得到内容文本特征;将对象标签文本序列输入至第三文本编码网络层,通过第三文本编码网络层,对对象标签文本序列进行文本编码,得到第三对象标签特征;将内容文本特征、候选共享视频片段对应的待解码文本特征Si以及第三对象标签特征分别输入至注意力网络层,通过注意力网络层,对内容文本特征、待解码文本特征Si以及第三对象标签特征进行特征融合,得到内容文本特征对应的注意力权重;i为小于N的非负整数;根据内容文本特征对应的注意力权重,确定候选共享视频片段对应的待解码文本特征Si+1;待解码文本特征Si所指示的共享词为待解码文本特征Si+1所指示的共享词的上一个共享词;当i+1等于N时,将N个待解码文本特征分别输入至文本解码网络层,通过文本解码网络层,生成N个待解码文本特征分别指示的共享词,将N个共享词组成为候选共享视频片段对应的描述文本;根据N个待解码文本特征,生成候选共享视频片段对应的文本共享质量。Specifically, the description text is composed of N shared words; a video recognition model is obtained; the video recognition model includes the fifth video recognition sub-model; the fifth video recognition sub-model includes the second text encoding network layer and the third text encoding network layer, attention network layer and text decoding network layer; input the content text corresponding to the candidate shared video clip into the second text encoding network layer, and perform text encoding on the content text corresponding to the candidate shared video clip through the second text encoding network layer , obtain the content text features; input the object label text sequence into the third text encoding network layer, and perform text encoding on the object label text sequence through the third text encoding network layer to obtain the third object label feature; input the content text features, candidates The text features to be decoded Si and the third object label features corresponding to the shared video clips are input to the attention network layer respectively. Through the attention network layer, the content text features, the text features to be decoded Si and the third object label features are characterized. Fusion, obtain the attention weight corresponding to the content text feature; i is a non-negative integer less than N; according to the attention weight corresponding to the content text feature, determine the to-be-decoded text feature S i+1 corresponding to the candidate shared video clip; the to-be-decoded text The shared word indicated by the feature S i is the previous shared word of the shared word indicated by the text feature S i+1 to be decoded; when i+1 is equal to N, N text features to be decoded are input to the text decoding network layer respectively. , through the text decoding network layer, generate N shared words indicated by the text features to be decoded respectively, and form the N shared words into description texts corresponding to the candidate shared video clips; based on the N text features to be decoded, generate the text corresponding to the candidate shared video clips Text sharing quality.
其中,候选共享视频片段对应的内容文本的定义请参见上文图3中的内容文本E1的定义,第二文本编码网络层以及第三文本编码网络层的定义,请参见上文图3中的第一文本编码网络层的定义;注意力网络层为Attention网络。For the definition of content text corresponding to the candidate shared video clip, please refer to the definition of content text E 1 in Figure 3 above. For the definitions of the second text encoding network layer and the third text encoding network layer, please refer to Figure 3 above. The definition of the first text encoding network layer; the attention network layer is the Attention network.
请一并参见图8,图8是本申请实施例提供的一种第五视频识别子模型的模型结构示意图。如图8所示,业务服务器对候选共享视频片段对应的内容文本进行基本处理,包括分词、标记(Token),通过词表(例如Lookup table)查询每个词(如图8所示的词1、词2、…词n)分别对应的初始词向量,将每个初始词向量作为第二文本编码网络层的输入,以对候选共享视频片段对应的内容文本进行理解,得到内容文本特征,即每个词分别对应的词向量,如图所示例的词1表示、词2表示、…、词n表示。其中,业务服务器得到第三对象标签特征(即图8中的对象表示)的过程,可以参见上文的第二对象标签特征的生成过程,此处不进行赘述。Please also refer to FIG. 8 , which is a schematic model structure diagram of a fifth video recognition sub-model provided by an embodiment of the present application. As shown in Figure 8, the business server performs basic processing on the content text corresponding to the candidate shared video clip, including word segmentation and tagging (Token), and queries each word (word 1 as shown in Figure 8) through a vocabulary table (such as Lookup table) , word 2,...word n) respectively. Each initial word vector is used as the input of the second text encoding network layer to understand the content text corresponding to the candidate shared video clip and obtain the content text characteristics, that is, The word vector corresponding to each word is represented by word 1, word 2,..., and word n as shown in the figure. Among them, the process of the business server obtaining the third object label feature (ie, the object representation in Figure 8) can be referred to the generation process of the second object label feature above, and will not be described again here.
进一步,业务服务器将内容文本特征(词1表示、词2表示、…、词n表示)、第三对象标签特征(对象表示)以及上一步生成的共享词表示,作为注意力网络层的输入,逐 步生成候选共享视频片段对应的共享文案(即描述文本),在生成每一步的共享词时,基于Attention机制确定是从内容文本中拷贝,还是从词表中选取词进行生成,最后,业务服务器将每步生成时的最大概率连乘,作为候选共享视频片段生成描述文本的文本共享质量。其中,图8中的符号“<S>”标识开始。Further, the business server uses the content text features (word 1 representation, word 2 representation,..., word n representation), the third object label feature (object representation) and the shared word representation generated in the previous step as input to the attention network layer. chase Step 1 generates sharing copy (i.e., description text) corresponding to the candidate shared video clip. When generating shared words at each step, it is determined based on the Attention mechanism whether to copy the content text or select the word from the vocabulary for generation. Finally, the business server Multiply the maximum probability in each generation step as a candidate shared video clip to generate text sharing quality describing the text. Among them, the symbol "<S>" in Figure 8 identifies the start.
步骤S206,根据候选共享视频片段对应的图像共享质量,以及候选共享视频片段对应的文本共享质量,确定候选共享视频片段对应的第三共享质量;根据候选共享视频片段对应的描述图像,以及候选共享视频片段对应的描述文本,确定候选共享视频片段对应的辅助描述信息。Step S206: Determine the third sharing quality corresponding to the candidate shared video clip according to the image sharing quality corresponding to the candidate shared video clip and the text sharing quality corresponding to the candidate shared video clip; according to the description image corresponding to the candidate shared video clip and the candidate sharing The description text corresponding to the video clip determines the auxiliary description information corresponding to the candidate shared video clip.
在一些实施例中,可以将候选共享视频片段的图像共享质量和文本共享质量,作为所述候选共享视频片段的第三共享质量。In some embodiments, the image sharing quality and text sharing quality of the candidate shared video segment may be used as the third sharing quality of the candidate shared video segment.
描述图像可以作为候选共享视频片段的视频封面,描述文本可以作为候选共享视频片段的视频文案,本申请实施例是以辅助描述信息包括描述图像以及描述文本为例叙述,在一些实施例中,辅助描述信息仅包括描述文本,或仅包括描述图像,或辅助描述信息包括音频内容等,本申请实施例不对辅助描述信息的内容进行限定,可以根据实际应用场景进行设定。The description image can be used as the video cover of the candidate shared video clip, and the description text can be used as the video copy of the candidate shared video clip. The embodiment of the present application takes the auxiliary description information including the description image and the description text as an example. In some embodiments, the auxiliary description information includes the description image and the description text. The description information only includes description text, or only includes description images, or the auxiliary description information includes audio content, etc. The embodiments of the present application do not limit the content of the auxiliary description information, and can be set according to actual application scenarios.
步骤S207,根据候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从候选共享视频片段中确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。Step S207: Determine the shared video segments from the candidate shared video segments based on the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video segments, and add the shared video segments and the auxiliary description information corresponding to the shared video segments. , identified as shared data to be sent to the shared object.
具体的,对候选共享视频片段对应的第一共享质量、第二共享质量、图像共享质量以及文本共享质量进行加权求和,得到候选共享视频片段对应的总共享质量,后续过程可以参见上文图3所对应的实施例中步骤S104的描述,此处不进行赘述。Specifically, the first sharing quality, the second sharing quality, the image sharing quality and the text sharing quality corresponding to the candidate shared video clips are weighted and summed to obtain the total sharing quality corresponding to the candidate shared video clips. The subsequent process can be seen in the figure above. The description of step S104 in the embodiment corresponding to 3 will not be described again here.
本申请实施例提出一种视频智能共享实现方法,通过对视频内容、视频互动数据进行深度挖掘,自动构建出共享价值高的多个候选共享视频片段,基于对象兴趣(即对象标签文本序列)选择符合共享对象的共享视频片段进行分享,并构建符合共享对象个性化的描述图像(可以作为共享视频片段的封面)以及描述文本(可以作为共享视频片段的文案),故可以吸引共享对象观看共享视频片段,进而可以提升视频平台共享转化,提升视频平台的整体播放情况。The embodiment of this application proposes a method for implementing intelligent video sharing. Through in-depth mining of video content and video interaction data, multiple candidate shared video clips with high sharing value are automatically constructed and selected based on object interest (i.e., object tag text sequence). Share the shared video clips that match the sharing object, and construct a personalized description image (which can be used as the cover of the shared video clip) and description text (which can be used as the copywriting of the shared video clip) that suits the sharing object, so it can attract the sharing object to watch the shared video Clips can thereby improve the sharing conversion of the video platform and improve the overall playback status of the video platform.
请参见图9,图9是本申请实施例提供的一种数据处理方法的另一流程示意图。该方法可以由业务服务器(例如,上述图1所示的业务服务器100)执行,也可以由终端设备(例如,上述图1所示的终端设备200a)执行,还可以由业务服务器和终端设备交互执行。为便于理解,本申请实施例以该方法由业务服务器执行为例进行说明。如图9所示,该方 法至少可以包括以下步骤。Please refer to FIG. 9 , which is another schematic flowchart of a data processing method provided by an embodiment of the present application. This method may be executed by a business server (for example, the business server 100 shown in Figure 1 above), or by a terminal device (for example, the terminal device 200a shown in Figure 1 above), or by interaction between the business server and the terminal device. implement. For ease of understanding, the embodiment of this application takes the method being executed by the service server as an example for description. As shown in Figure 9, this method The method may include at least the following steps.
步骤S301,获取训练样本集;训练样本集包括多个样本视频、与各样本视频相关联的浏览样本对象的对象标签样本文本序列、各样本视频对应的第一质量标签、第二质量标签以及第三质量标签。Step S301, obtain a training sample set; the training sample set includes a plurality of sample videos, a sample text sequence of object labels of the browsing sample objects associated with each sample video, a first quality label, a second quality label corresponding to each sample video, and a third quality label. Three quality labels.
在一些实施例中,针对所述多个样本视频中的每个样本视频,执行以下操作,以获取该样本视频对应的第一质量标签:In some embodiments, for each sample video in the plurality of sample videos, the following operations are performed to obtain the first quality label corresponding to the sample video:
对该样本视频对应的播放次数、时长以及平均播放完成度进行乘积运算,得到该样本视频对应的第一样本参数;Perform a product operation on the number of plays, duration and average play completion corresponding to the sample video to obtain the first sample parameter corresponding to the sample video;
对该样本视频对应的对象评论文本数量以及对象评论文本互动数量进行求和运算,得到该样本视频对应的第二样本参数;Perform a summation operation on the number of object comment texts corresponding to the sample video and the number of object comment text interactions to obtain the second sample parameter corresponding to the sample video;
确定该样本视频对应的第一样本参数以及第一样本参数最大值之间的第一比例,确定该样本视频对应的第二样本参数以及第二样本参数最大值之间的第二比例;Determine a first ratio between the first sample parameter corresponding to the sample video and the maximum value of the first sample parameter, and determine a second ratio between the second sample parameter corresponding to the sample video and the maximum value of the second sample parameter;
对第一比例以及第二比例进行加权求和,得到该样本视频对应的候选第一质量标签;Perform a weighted sum of the first ratio and the second ratio to obtain the candidate first quality label corresponding to the sample video;
若该样本视频对应的候选第一质量标签小于第一质量标签阈值,则将样本视频对应的候选第一质量标签确定为该样本视频对应的第一质量标签;若样本视频对应的候选第一质量标签等于或大于第一质量标签阈值,则将第一质量标签阈值确定为该样本视频对应的第一质量标签。If the candidate first quality label corresponding to the sample video is less than the first quality label threshold, then the candidate first quality label corresponding to the sample video is determined to be the first quality label corresponding to the sample video; if the candidate first quality label corresponding to the sample video is If the label is equal to or greater than the first quality label threshold, the first quality label threshold is determined as the first quality label corresponding to the sample video.
在一些实施例中,针对每个样本视频,执行以下操作,以获取该样本视频对应的第二质量标签:获取浏览样本对象针对样本视频的第一播放完成度;若第一播放完成度大于第一播放完成度阈值,则确定对象标签样本文本以及该样本视频之间存在第一正向关联关系,将第一正向关联关系确定为该样本视频的第二质量标签;若第一播放完成度小于或等于第一播放完成度阈值,则确定对象标签样本文本以及该样本视频之间存在第一反向关联关系,将第一反向关联关系确定为该样本视频的第二质量标签。In some embodiments, for each sample video, perform the following operations to obtain the second quality label corresponding to the sample video: obtain the first playback completion degree of the browsed sample object for the sample video; if the first playback completion degree is greater than the A playback completion threshold, then it is determined that there is a first positive correlation between the object label sample text and the sample video, and the first positive correlation is determined as the second quality label of the sample video; if the first playback completion is less than or equal to the first playback completion threshold, it is determined that there is a first reverse correlation between the object label sample text and the sample video, and the first reverse correlation is determined as the second quality label of the sample video.
在一些实施例中,训练样本集还包括每个样本视频对应的样本描述图像;第三质量标签包括描述图像质量标签;针对每个样本视频:获取浏览样本对象针对该样本视频的第二播放完成度;若第二播放完成度大于第二播放完成度阈值,则确定样本描述图像、对象标签样本文本以及该样本视频之间存在第二正向关联关系,将第二正向关联关系确定为该样本视频的描述图像质量标签;若第二播放完成度小于或等于第二播放完成度阈值,则确定样本描述图像、对象标签样本文本以及样本视频之间存在第二反向关联关系,将第二反向关联关系确定为该样本视频的描述图像质量标签。In some embodiments, the training sample set also includes a sample description image corresponding to each sample video; the third quality label includes a description image quality label; for each sample video: obtain the second playback completion of the browse sample object for the sample video degree; if the second playback completion degree is greater than the second playback completion degree threshold, it is determined that there is a second positive correlation between the sample description image, the object label sample text and the sample video, and the second positive correlation is determined as Description image quality label of the sample video; if the second playback completion degree is less than or equal to the second playback completion degree threshold, it is determined that there is a second reverse correlation between the sample description image, the object label sample text and the sample video, and the second The reverse correlation relationship is determined as the descriptive image quality label of the sample video.
在一些实施例中,第三质量标签包括描述文本质量标签;所述方法进一步包括:针对 每个样本视频:In some embodiments, the third quality label includes a description text quality label; the method further includes: for Each sample video:
获取浏览样本对象针对该样本视频的第三播放完成度;若第三播放完成度大于第三播放完成度阈值,则获取该样本视频对应的样本内容文本,将样本内容文本添加至训练样本集;确定对象标签样本文本序列以及样本内容文本之间存在第三正向关联关系,将第三正向关联关系确定为该样本视频的描述文本质量标签。Obtain the third playback completion degree of the browsing sample object for the sample video; if the third playback completion degree is greater than the third playback completion degree threshold, obtain the sample content text corresponding to the sample video, and add the sample content text to the training sample set; It is determined that there is a third positive correlation relationship between the object label sample text sequence and the sample content text, and the third positive correlation relationship is determined to be the description text quality label of the sample video.
其中,训练样本集可以包括用于训练第一视频识别子模型的第一训练样本集、用于训练第二视频识别子模型的第二训练样本集、用于训练第三视频识别子模型的第三训练样本集,当辅助描述信息只包括描述图像时,第三视频识别子模型包括第四视频识别子模型,第三训练样本集为第四训练样本集;当辅助描述信息只包括描述文本时,第三视频识别子模型包括第五视频识别子模型,第三训练样本集为第五训练样本集;当辅助描述信息包括描述图像以及描述文本时,第三视频识别子模型包括第四视频识别子模型以及第五视频识别子模型,第三训练样本集包括第四训练样本集以及第五训练样本集。其中,第一训练样本集包括多个样本视频以及各样本视频对应的第一质量标签;第五训练样本集包括多个样本视频、与各样本视频相关联的浏览样本对象的对象标签样本文本序列以及样本视频对应的描述文本质量标签。The training sample set may include a first training sample set for training the first video recognition sub-model, a second training sample set for training the second video recognition sub-model, and a third training sample set for training the third video recognition sub-model. Three training sample sets, when the auxiliary description information only includes description images, the third video recognition sub-model includes the fourth video recognition sub-model, and the third training sample set is the fourth training sample set; when the auxiliary description information only includes description text , the third video recognition sub-model includes the fifth video recognition sub-model, and the third training sample set is the fifth training sample set; when the auxiliary description information includes description images and description text, the third video recognition sub-model includes the fourth video recognition sub-model and the fifth video recognition sub-model, and the third training sample set includes the fourth training sample set and the fifth training sample set. Among them, the first training sample set includes a plurality of sample videos and the first quality label corresponding to each sample video; the fifth training sample set includes a plurality of sample videos, an object label sample text sequence of the browse sample object associated with each sample video And the description text quality label corresponding to the sample video.
可以理解的是,上述五个训练样本集分别包括的样本视频可以相同,也可以不相同,区别主要是标签不相同以及用途不相同。可以理解的是,视频平台有较多的短视频,故可以将短视频确定为样本视频,相较于图3所对应的实施例中的视频对应的时长,短视频对应的时长较短,例如短视频对应的时长等同于视频片段对应的时长。It can be understood that the sample videos included in the above five training sample sets can be the same or different. The main difference is that the labels and uses are different. It can be understood that the video platform has a lot of short videos, so the short videos can be determined as sample videos. Compared with the corresponding duration of the video in the embodiment corresponding to Figure 3, the corresponding duration of the short video is shorter, for example The duration corresponding to the short video is equal to the duration corresponding to the video clip.
可以理解的是,第一质量标签阈值、第一播放完成度阈值、第二播放完成度阈值以及第三播放完成度阈值,均可以根据实际应用场景进行调整,本申请实施例不对上述4个阈值进行限定。It can be understood that the first quality label threshold, the first playback completion threshold, the second playback completion threshold, and the third playback completion threshold can all be adjusted according to actual application scenarios. The embodiments of this application do not use the above four thresholds. Make restrictions.
步骤S302,将训练样本集输入至视频识别模型,通过视频识别模型,分别确定各样本视频对应的第一预测质量。Step S302: Input the training sample set to the video recognition model, and determine the first prediction quality corresponding to each sample video through the video recognition model.
具体的,业务服务器可以将步骤S301中的第一训练样本集输入至视频识别模型中的第一视频识别子模型,其中,业务服务器通过第一视频识别子模型,得到各样本视频对应的第一预测质量的处理过程,与通过第一视频识别子模型,得到视频片段对应的第一共享质量的处理过程是一致的,故请参见上文图3所对应的实施例中步骤S101的描述,此处不进行赘述。Specifically, the business server can input the first training sample set in step S301 to the first video recognition sub-model in the video recognition model, where the business server obtains the first video recognition sub-model corresponding to each sample video through the first video recognition sub-model. The process of predicting quality is consistent with the process of obtaining the first shared quality corresponding to the video clip through the first video recognition sub-model. Therefore, please refer to the description of step S101 in the embodiment corresponding to Figure 3 above. No further details will be given.
步骤S303,根据对象标签样本文本序列以及各样本视频,分别确定各样本视频对应的第二预测质量以及第三预测质量。 Step S303: Determine the second prediction quality and the third prediction quality corresponding to each sample video according to the object label sample text sequence and each sample video.
具体的,业务服务器可以将步骤S301中的第二训练样本集输入至视频识别模型中的第二视频识别子模型,其中,业务服务器通过第二视频识别子模型,得到各样本视频对应的第二预测质量的处理过程,与通过第二视频识别子模型,得到视频片段对应的第二共享质量的处理过程是一致的,故请参见上文图3所对应的实施例中步骤S102的描述,此处不进行赘述。Specifically, the business server can input the second training sample set in step S301 to the second video recognition sub-model in the video recognition model, where the business server obtains the second video recognition sub-model corresponding to each sample video through the second video recognition sub-model. The process of predicting quality is consistent with the process of obtaining the second shared quality corresponding to the video clip through the second video recognition sub-model. Therefore, please refer to the description of step S102 in the embodiment corresponding to Figure 3 above. No further details will be given.
业务服务器可以将步骤S301中的第三训练样本集输入至视频识别模型中的第三视频识别子模型,其中,业务服务器通过第三视频识别子模型,得到各样本视频对应的第三预测质量的处理过程,与通过第三视频识别子模型,得到视频片段对应的第三共享质量的处理过程是一致的,故请参见上文图3所对应的实施例中步骤S103的描述,此处不进行赘述。The business server may input the third training sample set in step S301 to the third video recognition sub-model in the video recognition model, where the business server obtains the third prediction quality corresponding to each sample video through the third video recognition sub-model. The processing process is consistent with the processing process of obtaining the third shared quality corresponding to the video clip through the third video recognition sub-model. Therefore, please refer to the description of step S103 in the embodiment corresponding to Figure 3 above, which will not be performed here. Repeat.
步骤S304,根据第一质量标签、第二质量标签、第三质量标签、第一预测质量、第二预测质量以及第三预测质量,对视频识别模型中的参数进行调整,得到训练后的视频识别模型;所述训练后的视频识别模型用于确定视频的共享数据;共享数据包括视频中的共享视频片段以及共享视频片段对应的辅助描述信息。Step S304: Adjust the parameters in the video recognition model according to the first quality label, the second quality label, the third quality label, the first prediction quality, the second prediction quality and the third prediction quality to obtain the trained video recognition Model; the trained video recognition model is used to determine the shared data of the video; the shared data includes shared video segments in the video and auxiliary description information corresponding to the shared video segments.
具体的,视频识别模型包括用于确定第一预测质量的第一视频识别子模型、用于确定第二预测质量的第二视频识别子模型,以及用于确定第三预测质量的第三视频识别子模型;视频识别模型中的参数包括第一视频识别子模型中的参数、第二视频识别子模型中的参数,以及第三视频识别子模型中的参数;确定第一质量标签以及第一预测质量之间的第一质量损失值,根据第一质量损失值,对第一视频识别子模型中的参数进行调整,得到训练后的第一视频识别子模型;确定第二质量标签以及第二预测质量之间的第二质量损失值,根据第二质量损失值,对第二视频识别子模型中的参数进行调整,得到训练后的第二视频识别子模型;确定第三质量标签以及第三预测质量之间的第三质量损失值,根据第三质量损失值,对第三视频识别子模型中的参数进行调整,得到训练后的第三视频识别子模型;当第一视频识别子模型、第二视频识别子模型以及第三视频识别子模型均满足模型收敛条件时,生成包含训练后的第一视频识别子模型、训练后的第二视频识别子模型以及训练后的第三视频识别子模型的训练后的视频识别模型。Specifically, the video recognition model includes a first video recognition sub-model used to determine the first prediction quality, a second video recognition sub-model used to determine the second prediction quality, and a third video recognition sub-model used to determine the third prediction quality. sub-model; the parameters in the video recognition model include parameters in the first video recognition sub-model, parameters in the second video recognition sub-model, and parameters in the third video recognition sub-model; determining the first quality label and the first prediction The first quality loss value between qualities, adjust the parameters in the first video recognition sub-model according to the first quality loss value, and obtain the trained first video recognition sub-model; determine the second quality label and the second prediction The second quality loss value between the qualities, adjust the parameters in the second video recognition sub-model according to the second quality loss value, and obtain the trained second video recognition sub-model; determine the third quality label and the third prediction The third quality loss value between the qualities, according to the third quality loss value, adjust the parameters in the third video recognition sub-model to obtain the trained third video recognition sub-model; when the first video recognition sub-model, the third video recognition sub-model When both the second video recognition sub-model and the third video recognition sub-model meet the model convergence conditions, the first video recognition sub-model after training, the second video recognition sub-model after training and the third video recognition sub-model after training are generated. The trained video recognition model.
本申请实施例通过第一训练样本集对第一视频识别子模型进行深度建模,以使第一视频识别子模型可以在多个视频片段中确定具备高共享价值的候选视频片段,通过第二训练样本集对第二视频识别子模型进行深度建模,以使第二视频识别子模型可以在候选视频片段中确定具备高共享价值的候选共享视频片段,通过第三训练样本集对辅助视频识别子模型进行深度建模,以使第三视频识别子模型可以确定候选共享视频片段对应的第三共享质 量以及辅助描述信息,进而可以通过不同维度的共享质量,确定共享视频片段以及其对应的辅助描述信息,进而可以生成共享数据,由于共享数据不仅与共享视频片段自身的视频内容相关联,还与对象标签文本序列相关联,故通过共享数据,可以提高视频的共享效率以及共享效果。The embodiment of the present application performs in-depth modeling on the first video recognition sub-model through the first training sample set, so that the first video recognition sub-model can determine candidate video clips with high sharing value among multiple video clips, and through the second The training sample set performs in-depth modeling on the second video recognition sub-model, so that the second video recognition sub-model can determine candidate shared video segments with high sharing value among the candidate video segments, and assists video recognition through the third training sample set. The sub-model performs deep modeling so that the third video recognition sub-model can determine the third shared quality corresponding to the candidate shared video clip. quantity and auxiliary description information, and then the shared video clips and their corresponding auxiliary description information can be determined through the sharing quality of different dimensions, and then the shared data can be generated, because the shared data is not only associated with the video content of the shared video clip itself, but also with the video content of the shared video clip itself. Object tag text sequences are associated, so by sharing data, the sharing efficiency and sharing effect of the video can be improved.
进一步地,请参见图10,图10是本申请实施例提供的一种数据处理装置的结构示意图一。上述数据处理装置1可以用于执行本申请实施例提供的方法中的相应步骤。如图10所示,该数据处理装置1可以包括:第一获取模块110、第二获取模块120、第一确定模块130以及第二确定模块140。Further, please refer to FIG. 10 , which is a schematic structural diagram of a data processing device provided by an embodiment of the present application. The above-mentioned data processing device 1 can be used to execute corresponding steps in the method provided by the embodiments of the present application. As shown in FIG. 10 , the data processing device 1 may include: a first acquisition module 110 , a second acquisition module 120 , a first determination module 130 and a second determination module 140 .
第一获取模块110,用于获取视频中的至少两个视频片段,确定至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从至少两个视频片段中选择至少一个视频片段作为候选视频片段;The first acquisition module 110 is configured to acquire at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and select at least one video clip from the at least two video clips according to the first sharing quality. Video clips as candidate video clips;
第二获取模块120,用于获取与视频相关联的对象标签文本序列,所述对象标签文本序列包括分享所述视频的浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣;根据对象标签文本序列以及候选视频片段,确定每个候选视频片段对应的第二共享质量,根据每个候选视频片段对应的第二共享质量,从候选视频片段中选择至少一个候选视频片段作为候选共享视频片段;所述第二共享质量用于表征所述候选视频片段与所述共享对象的对象标签文本的相关性;The second acquisition module 120 is configured to obtain an object tag text sequence associated with the video, where the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; The object tag text of the browsing object is used to characterize the interest of the browsing object, and the object tag text of the shared object is used to characterize the interest of the shared object; each candidate video segment is determined according to the object tag text sequence and the candidate video segments. Corresponding second sharing quality, according to the second sharing quality corresponding to each candidate video segment, select at least one candidate video segment from the candidate video segments as the candidate shared video segment; the second sharing quality is used to characterize the candidate video The relevance of the fragment to the object tag text of the shared object;
第一确定模块130,用于根据对象标签文本序列以及候选共享视频片段,确定每个候选共享视频片段对应的第三共享质量,以及每个候选共享视频片段对应的辅助描述信息;所述第三共享质量用于表征所述辅助描述信息与所述候选共享视频片段以及共享对象的对象标签文本的匹配度;The first determination module 130 is configured to determine the third sharing quality corresponding to each candidate shared video segment and the auxiliary description information corresponding to each candidate shared video segment according to the object label text sequence and the candidate shared video segment; the third Sharing quality is used to characterize the matching degree of the auxiliary description information with the candidate shared video clip and the object tag text of the shared object;
第二确定模块140,用于根据每个候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从候选共享视频片段中确定共享视频片段,将共享视频片段以及共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。The second determination module 140 is configured to determine the shared video segments from the candidate shared video segments according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to each candidate shared video segment, and combine the shared video segments and the shared video segments. The auxiliary description information corresponding to the video clip is determined as shared data for sending to the sharing object.
其中,第一获取模块110、第二获取模块120、第一确定模块130以及第二确定模块140的具体功能实现方式可以参见上述图3对应实施例中的步骤S101-步骤S104,这里不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。For the specific functional implementation of the first acquisition module 110, the second acquisition module 120, the first determination module 130 and the second determination module 140, please refer to steps S101 to S104 in the corresponding embodiment of Figure 3 above, which will not be performed here. Repeat. In addition, the description of the beneficial effects of using the same method will not be described again.
进一步地,请参见图11,图11是本申请实施例提供的一种数据处理装置的另一结构示意图。上述数据处理装置2可以用于执行本申请实施例提供的方法中的相应步骤。如图11所示,该数据处理装置2可以包括:第一获取模块11、第二获取模块12、第一确定模 块13以及第二确定模块14。Further, please refer to FIG. 11 , which is another schematic structural diagram of a data processing device provided by an embodiment of the present application. The above-mentioned data processing device 2 can be used to execute corresponding steps in the method provided by the embodiments of the present application. As shown in Figure 11, the data processing device 2 may include: a first acquisition module 11, a second acquisition module 12, a first determination module Block 13 and the second determination module 14.
需要说明的是,图11中的第一获取模块11具有图10中的第一获取模块110的全部或部分功能,图11中的第二获取模块12具有图10中的第二获取模块120的全部或部分功能,图11中的第一确定模块13具有图10中的第一确定模块130的全部或部分功能,图11中的第二确定模块14具有图10中的第二确定模块140的全部或部分功能。It should be noted that the first acquisition module 11 in Figure 11 has all or part of the functions of the first acquisition module 110 in Figure 10 , and the second acquisition module 12 in Figure 11 has the functions of the second acquisition module 120 in Figure 10 All or part of the functions, the first determination module 13 in Figure 11 has all or part of the functions of the first determination module 130 in Figure 10 , and the second determination module 14 in Figure 11 has the functions of the second determination module 140 in Figure 10 All or part of the functionality.
再请参见图11,第一获取模块11可以包括:第一处理单元111、第一获取单元112。Referring again to FIG. 11 , the first acquisition module 11 may include: a first processing unit 111 and a first acquisition unit 112 .
第一处理单元111,用于获取视频,根据时间窗口对视频进行切分处理,得到视频对应的至少两个视频片段;The first processing unit 111 is used to obtain the video, segment the video according to the time window, and obtain at least two video segments corresponding to the video;
第一获取单元112,用于针对所述至少两个视频片段中的每个视频片段,执行以下操作,以确定该视频片段对应的第一共享质量:The first acquisition unit 112 is configured to perform the following operations for each video segment in the at least two video segments to determine the first sharing quality corresponding to the video segment:
从所述视频片段中获取K个视频帧以及所述K个视频帧分别对应的音频帧;K为正整数;对所述K个视频帧分别对应的视频特征进行融合,得到该视频片段的视频特征;对所述K个音频帧分别对应的音频特征进行融合,得到该视频片段的音频特征;根据所述视频片段的音频识别文本、视频描述文本以及对象评论文本,得到所述视频片段对应的文本特征;对所述视频片段的视频特征、音频特征以及所述文本特征进行融合,得到所述视频片段对应的多维度融合特征;根据所述多维度融合特征,确定所述视频片段对应的第一共享质量;Obtain K video frames and audio frames corresponding to the K video frames from the video clips; K is a positive integer; fuse the video features corresponding to the K video frames to obtain the video of the video clip. Features; fuse the audio features corresponding to the K audio frames to obtain the audio features of the video clip; obtain the audio features corresponding to the video clip according to the audio recognition text, video description text and object comment text of the video clip. Text features; fuse the video features, audio features and text features of the video clips to obtain multi-dimensional fusion features corresponding to the video clips; determine the third fusion feature corresponding to the video clips based on the multi-dimensional fusion features. a shared quality;
根据各视频片段对应的多维度融合特征,分别确定各视频片段分别的第一共享质量。According to the multi-dimensional fusion features corresponding to each video clip, the first sharing quality of each video clip is determined respectively.
其中,第一处理单元111和第一获取单元112的具体功能实现方式可以参见上述图3对应实施例中的步骤S101,这里不再进行赘述。For the specific functional implementation of the first processing unit 111 and the first acquisition unit 112, please refer to step S101 in the corresponding embodiment of FIG. 3, which will not be described again here.
再请参见图11,第二获取模块12可以包括:第二获取单元121、生成单元122。Referring again to FIG. 11 , the second acquisition module 12 may include: a second acquisition unit 121 and a generation unit 122 .
第二获取单元121,用于获取与所述视频相关联的浏览对象的对象标签文本,获取与所述浏览对象相关联的所述共享对象的对象标签文本;The second obtaining unit 121 is used to obtain the object tag text of the browsing object associated with the video, and obtain the object tag text of the shared object associated with the browsing object;
根据所述浏览对象的对象标签文本以及所述共享对象的对象标签文本,生成所述对象标签文本序列。The object tag text sequence is generated according to the object tag text of the browse object and the object tag text of the shared object.
生成单元122,用于针对每个候选视频片段,执行以下操作,以确定该候选视频片段对应的第二共享质量:The generation unit 122 is configured to perform the following operations for each candidate video segment to determine the second sharing quality corresponding to the candidate video segment:
将所述对象标签文本序列以及所述候选视频片段分别输入至视频识别模型;所述视频识别模型包括第二视频识别子模型;所述第二视频识别子模型包括第一文本编码网络层;The object label text sequence and the candidate video segment are respectively input to a video recognition model; the video recognition model includes a second video recognition sub-model; the second video recognition sub-model includes a first text encoding network layer;
通过所述第一文本编码网络层,对所述对象标签文本序列中的每个对象标签文本进行文本编码,得到所述对象标签文本序列对应的第一对象标签特征; Through the first text encoding network layer, text encoding is performed on each object label text in the object label text sequence to obtain the first object label feature corresponding to the object label text sequence;
获取所述候选视频片段对应的多维度融合特征,根据所述第一对象标签特征以及所述候选视频片段对应的多维度融合特征,确定所述候选视频片段对应的第二共享质量。Multi-dimensional fusion features corresponding to the candidate video segments are obtained, and second sharing quality corresponding to the candidate video segments is determined based on the first object label features and the multi-dimensional fusion features corresponding to the candidate video segments.
其中,第二获取单元121和生成单元122的具体功能实现方式可以参见上述图3对应实施例中的步骤S102,这里不再进行赘述。For the specific functional implementation of the second acquisition unit 121 and the generation unit 122, please refer to step S102 in the corresponding embodiment of FIG. 3, which will not be described again here.
再请参见图11,候选共享视频片段对应的辅助描述信息包括候选共享视频片段对应的描述图像,以及候选共享视频片段对应的描述文本;所述候选共享视频片段对应的第三共享质量包括所述描述图像对应的图像共享质量,以及所述描述文本对应的文本共享质量;Referring again to Figure 11, the auxiliary description information corresponding to the candidate shared video clip includes a description image corresponding to the candidate shared video clip, and a description text corresponding to the candidate shared video clip; the third sharing quality corresponding to the candidate shared video clip includes the The image sharing quality corresponding to the description image, and the text sharing quality corresponding to the description text;
第一确定模块13可以包括:第三获取单元131、第二确定单元132以及第三确定单元133。The first determination module 13 may include: a third acquisition unit 131, a second determination unit 132, and a third determination unit 133.
针对每个候选共享视频片段:Share video clips for each candidate:
第三获取单元131,用于获取候选共享视频片段中的至少两个视频帧;The third obtaining unit 131 is used to obtain at least two video frames in the candidate shared video clips;
第二确定单元132,用于确定所述至少两个视频帧中每个视频帧对应的图像共享质量,根据每个视频帧对应的图像共享质量,确定候选共享视频片段的图像共享质量,并从所述至少两个视频帧中选择一个视频帧作为候选共享视频片段对应的描述图像;The second determining unit 132 is configured to determine the image sharing quality corresponding to each video frame in the at least two video frames, determine the image sharing quality of the candidate shared video segment according to the image sharing quality corresponding to each video frame, and determine the image sharing quality from Select one video frame from the at least two video frames as the description image corresponding to the candidate shared video segment;
第三确定单元133,用于根据对象标签文本序列以及候选共享视频片段对应的内容文本,确定候选共享视频片段对应的文本共享质量,以及候选共享视频片段对应的描述文本。The third determination unit 133 is configured to determine the text sharing quality corresponding to the candidate shared video clips and the description text corresponding to the candidate shared video clips based on the object tag text sequence and the content text corresponding to the candidate shared video clips.
其中,第三获取单元131、第二确定单元132以及第三确定单元133的具体功能实现方式可以参见上述图6对应实施例中的步骤S203-步骤S206,这里不再进行赘述。For the specific functional implementation of the third obtaining unit 131, the second determining unit 132 and the third determining unit 133, please refer to steps S203 to S206 in the corresponding embodiment of FIG. 6, which will not be described again here.
再请参见图11,第二确定模块14可以包括:质量求和单元141、第四确定单元142。Referring again to FIG. 11 , the second determination module 14 may include: a quality summation unit 141 and a fourth determination unit 142 .
质量求和单元141,用于对每个候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量分别进行加权求和,得到各候选共享视频片段对应的总共享质量;The quality summation unit 141 is configured to perform a weighted sum of the first shared quality, the second shared quality, and the third shared quality corresponding to each candidate shared video segment, respectively, to obtain the total shared quality corresponding to each candidate shared video segment;
第四确定单元142,用于将至少两个候选共享视频片段中,总共享质量最大的候选共享视频片段确定为共享视频片段;The fourth determination unit 142 is configured to determine the candidate shared video segment with the largest total sharing quality among the at least two candidate shared video segments as the shared video segment;
在至少两个候选共享视频片段分别对应的辅助描述信息中,获取共享视频片段对应的辅助描述信息。From the auxiliary description information corresponding to at least two candidate shared video clips, the auxiliary description information corresponding to the shared video clip is obtained.
其中,质量求和单元141和第四确定单元142的具体功能实现方式可以参见上述图3对应实施例中的步骤S104,这里不再进行赘述。The specific functional implementation of the quality summation unit 141 and the fourth determination unit 142 can be referred to step S104 in the corresponding embodiment of FIG. 3 above, and will not be described again here.
本申请中的共享数据是基于不同维度的共享质量所确定的,不仅与共享视频片段自身的视频内容相关联,还与对象标签文本序列相关联,故通过共享数据,可以提高视频的共享效率以及共享效果。The shared data in this application is determined based on the sharing quality of different dimensions. It is not only associated with the video content of the shared video clip itself, but also associated with the object tag text sequence. Therefore, by sharing data, the sharing efficiency of the video can be improved. Share effects.
进一步地,请参见图12,图12是本申请实施例提供的一种数据处理装置的另一结构 示意图。上述数据处理装置3可以用于执行本申请实施例提供的方法中的相应步骤。如图12所示,该数据处理装置3可以包括:第一获取模块210、第一确定模块220、第二确定模块230以及参数调整模块240。Further, please refer to Figure 12. Figure 12 is another structure of a data processing device provided by an embodiment of the present application. Schematic diagram. The above-mentioned data processing device 3 can be used to execute corresponding steps in the method provided by the embodiments of this application. As shown in FIG. 12 , the data processing device 3 may include: a first acquisition module 210 , a first determination module 220 , a second determination module 230 and a parameter adjustment module 240 .
第一获取模块210,用于获取训练样本集;训练样本集包括多个样本视频、与各样本视频相关联的浏览样本对象的对象标签样本文本序列、各样本视频对应的第一质量标签、第二质量标签以及第三质量标签;The first acquisition module 210 is used to acquire a training sample set; the training sample set includes a plurality of sample videos, a sample text sequence of object tags of browsing sample objects associated with each sample video, a first quality label corresponding to each sample video, a third Second quality label and third quality label;
第一确定模块220,用于将训练样本集输入至视频识别模型,通过视频识别模型,确定各样本视频对应的第一预测质量;The first determination module 220 is used to input the training sample set to the video recognition model, and determine the first prediction quality corresponding to each sample video through the video recognition model;
第二确定模块230,用于根据对象标签样本文本序列以及所述多个样本视频,分别确定各样本视频对应的第二预测质量以及第三预测质量;The second determination module 230 is configured to determine the second prediction quality and the third prediction quality corresponding to each sample video according to the object label sample text sequence and the plurality of sample videos;
参数调整模块240,用于根据第一质量标签、第二质量标签、第三质量标签、第一预测质量、第二预测质量以及第三预测质量,对视频识别模型中的参数进行调整,得到训练后的视频识别模型;训练后的视频识别模型用于确定视频的共享数据;共享数据包括视频中的共享视频片段以及共享视频片段对应的辅助描述信息。The parameter adjustment module 240 is used to adjust the parameters in the video recognition model according to the first quality label, the second quality label, the third quality label, the first prediction quality, the second prediction quality and the third prediction quality to obtain training. The video recognition model after training is used to determine the shared data of the video; the shared data includes the shared video clips in the video and the auxiliary description information corresponding to the shared video clips.
其中,第一获取模块210、第一确定模块220、第二确定模块230以及参数调整模块240的具体功能实现方式可以参见上述图9对应实施例中的步骤S301-步骤S304,这里不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。Among them, the specific functional implementation of the first acquisition module 210, the first determination module 220, the second determination module 230 and the parameter adjustment module 240 can be referred to steps S301 to S304 in the corresponding embodiment of Figure 9 above, and will not be described again here. . In addition, the description of the beneficial effects of using the same method will not be described again.
进一步地,请参见图13,图13是本申请实施例提供的一种数据处理装置的另一结构示意图。上述数据处理装置4可以用于执行本申请实施例提供的方法中的相应步骤。如图13所示,该数据处理装置4可以包括:第一获取模块21、第一确定模块22、第二确定模块23以及参数调整模块24。Further, please refer to FIG. 13 , which is another schematic structural diagram of a data processing device provided by an embodiment of the present application. The above-mentioned data processing device 4 can be used to execute corresponding steps in the method provided by the embodiments of the present application. As shown in FIG. 13 , the data processing device 4 may include: a first acquisition module 21 , a first determination module 22 , a second determination module 23 and a parameter adjustment module 24 .
需要说明的是,图13中的第一获取模块21具有图12中的第一获取模块210的全部或部分功能,图13中的第一确定模块22具有图12中的第一确定模块220的全部或部分功能,图13中的第二确定模块23具有图12中的第二确定模块230的全部或部分功能,图13中的参数调整模块24具有图12中的参数调整模块240的全部或部分功能。It should be noted that the first acquisition module 21 in Figure 13 has all or part of the functions of the first acquisition module 210 in Figure 12 , and the first determination module 22 in Figure 13 has the functions of the first determination module 220 in Figure 12 All or part of the functions, the second determination module 23 in Figure 13 has all or part of the functions of the second determination module 230 in Figure 12 , and the parameter adjustment module 24 in Figure 13 has all or part of the parameter adjustment module 240 in Figure 12 Some functions.
请再参见图13,数据处理装置4还可以包括:第一运算模块25、第二运算模块26、第二获取模块27、第三确定模块28、比例求和模块29、第一对比模块30以及第四确定模块31。Please refer to Figure 13 again, the data processing device 4 may also include: a first operation module 25, a second operation module 26, a second acquisition module 27, a third determination module 28, a proportion summation module 29, a first comparison module 30 and The fourth determination module 31.
第一运算模块25,用于针对每个样本视频,对该样本视频对应的播放次数、时长以及平均播放完成度进行乘积运算,得到该样本视频对应的第一样本参数;The first operation module 25 is configured to perform a product operation for each sample video on the number of plays, duration and average play completion corresponding to the sample video to obtain the first sample parameter corresponding to the sample video;
第二运算模块26,用于针对每个样本视频,对该样本视频对应的对象评论文本数量以 及对象评论文本互动数量进行求和运算,得到该样本视频对应的第二样本参数;The second operation module 26 is used to calculate, for each sample video, the number of object comment texts corresponding to the sample video. and the number of interactions with the object's comment text are summed to obtain the second sample parameter corresponding to the sample video;
第二获取模块27,用于在至少两个样本视频分别对应的第一样本参数中,获取第一样本参数最大值,在至少两个样本视频分别对应的第二样本参数中,获取第二样本参数最大值;The second acquisition module 27 is configured to obtain the maximum value of the first sample parameter among the first sample parameters corresponding to at least two sample videos, and obtain the maximum value of the first sample parameter among the second sample parameters corresponding to the at least two sample videos. The maximum value of the two sample parameters;
第三确定模块28,用于确定各样本视频对应的第一样本参数以及第一样本参数最大值之间的第一比例,确定各样本视频对应的第二样本参数以及第二样本参数最大值之间的第二比例;The third determination module 28 is used to determine the first ratio between the first sample parameter corresponding to each sample video and the maximum value of the first sample parameter, and determine the second sample parameter corresponding to each sample video and the maximum value of the second sample parameter. second ratio between values;
比例求和模块29,用于对各样本视频的第一比例以及第二比例分别进行加权求和,得到各样本视频对应的候选第一质量标签;The proportion summation module 29 is used to perform a weighted sum of the first proportion and the second proportion of each sample video to obtain the candidate first quality label corresponding to each sample video;
第一对比模块30,用于将各样本视频对应的候选第一质量标签与第一质量标签阈值分别进行对比;The first comparison module 30 is used to compare the first quality label candidate corresponding to each sample video with the first quality label threshold respectively;
第四确定模块31,用于针对每个样本视频,若该样本视频对应的候选第一质量标签小于第一质量标签阈值,则将该样本视频对应的候选第一质量标签确定为该样本视频对应的第一质量标签;The fourth determination module 31 is configured to determine, for each sample video, the candidate first quality label corresponding to the sample video if the candidate first quality label corresponding to the sample video is less than the first quality label threshold. The first quality label;
第四确定模块31,还用于若该样本视频对应的候选第一质量标签等于或大于第一质量标签阈值,则将第一质量标签阈值确定为该样本视频对应的第一质量标签。The fourth determination module 31 is also configured to determine the first quality label threshold corresponding to the first quality label corresponding to the sample video if the candidate first quality label corresponding to the sample video is equal to or greater than the first quality label threshold.
其中,第一运算模块25、第二运算模块26、第二获取模块27、第三确定模块28、比例求和模块29、第一对比模块30以及第四确定模块31的具体功能实现方式可以参见上述图9对应实施例中的步骤S301,这里不再进行赘述。Among them, the specific functional implementation of the first operation module 25, the second operation module 26, the second acquisition module 27, the third determination module 28, the proportion summation module 29, the first comparison module 30 and the fourth determination module 31 can be found in The above-mentioned FIG. 9 corresponds to step S301 in the embodiment, and will not be described again here.
再请参见图13,数据处理装置4还可以包括:第二对比模块32以及第五确定模块33。Referring again to FIG. 13 , the data processing device 4 may further include: a second comparison module 32 and a fifth determination module 33 .
第二对比模块32,用于获取浏览样本对象针对各样本视频的第一播放完成度,将各样本视频的第一播放完成度与第一播放完成度阈值分别进行对比;The second comparison module 32 is used to obtain the first playback completion degree of the browse sample object for each sample video, and compare the first playback completion degree of each sample video with the first playback completion degree threshold respectively;
第五确定模块33,用于针对每个样本视频,若该样本视频的第一播放完成度大于第一播放完成度阈值,则确定对象标签样本文本以及该样本视频之间存在第一正向关联关系,将第一正向关联关系确定为该样本视频的第二质量标签;The fifth determination module 33 is configured to determine, for each sample video, that there is a first positive association between the object label sample text and the sample video if the first playback completion degree of the sample video is greater than the first playback completion degree threshold. relationship, determine the first positive relationship as the second quality label of the sample video;
第五确定模块33,还用于若该样本视频的第一播放完成度小于或等于第一播放完成度阈值,则确定对象标签样本文本以及该样本视频之间存在第一反向关联关系,将第一反向关联关系确定为该样本视频的第二质量标签。The fifth determination module 33 is also configured to determine that there is a first reverse association between the object label sample text and the sample video if the first playback completion degree of the sample video is less than or equal to the first playback completion degree threshold, and the The first reverse correlation relationship is determined as the second quality label of the sample video.
其中,第二对比模块32以及第五确定模块33的具体功能实现方式可以参见上述图9对应实施例中的步骤S301,这里不再进行赘述。The specific functional implementation of the second comparison module 32 and the fifth determination module 33 can be referred to step S301 in the corresponding embodiment of FIG. 9 , and will not be described again here.
再请参见图13,训练样本集还包括样本视频对应的样本描述图像;第三质量标签包 括描述图像质量标签;Please refer to Figure 13 again. The training sample set also includes the sample description image corresponding to the sample video; the third quality label package Includes labels describing image quality;
数据处理装置4还可以包括:第三对比模块34以及第六确定模块35。The data processing device 4 may also include: a third comparison module 34 and a sixth determination module 35 .
第三对比模块34,用于获取浏览样本对象针对各样本视频的第二播放完成度,将各样本视频的第二播放完成度与第二播放完成度阈值分别进行对比;The third comparison module 34 is used to obtain the second playback completion degree of the browse sample object for each sample video, and compare the second playback completion degree of each sample video with the second playback completion degree threshold respectively;
第六确定模块35,用于针对每个样本视频,若该样本视频的第二播放完成度大于第二播放完成度阈值,则确定该样本视频对应的样本描述图像、对象标签样本文本以及该样本视频之间存在第二正向关联关系,将第二正向关联关系确定为该样本视频的描述图像质量标签;The sixth determination module 35 is used for each sample video, if the second playback completion degree of the sample video is greater than the second playback completion degree threshold, determine the sample description image, the object label sample text and the sample corresponding to the sample video. There is a second positive correlation between the videos, and the second positive correlation is determined as the descriptive image quality label of the sample video;
第六确定模块35,还用于若该样本视频的第二播放完成度小于或等于第二播放完成度阈值,则确定该样本视频对应的样本描述图像、对象标签样本文本以及该样本视频之间存在第二反向关联关系,将第二反向关联关系确定为该样本视频的描述图像质量标签。The sixth determination module 35 is also configured to determine the relationship between the sample description image corresponding to the sample video, the object label sample text and the sample video if the second playback completion degree of the sample video is less than or equal to the second playback completion degree threshold. There is a second reverse correlation relationship, and the second reverse correlation relationship is determined as a descriptive image quality label of the sample video.
其中,第三对比模块34以及第六确定模块35的具体功能实现方式可以参见上述图9对应实施例中的步骤S301,这里不再进行赘述。The specific functional implementation of the third comparison module 34 and the sixth determination module 35 can be referred to step S301 in the corresponding embodiment of FIG. 9 , and will not be described again here.
再请参见图13,第三质量标签包括描述文本质量标签;Referring again to Figure 13, the third quality label includes a description text quality label;
数据处理装置4还可以包括:第三获取模块36、第四获取模块37以及第七确定模块38。The data processing device 4 may also include: a third acquisition module 36 , a fourth acquisition module 37 and a seventh determination module 38 .
第三获取模块36,用于获取浏览样本对象针对各样本视频的第三播放完成度;The third acquisition module 36 is used to obtain the third playback completion degree of the browsed sample object for each sample video;
第四获取模块37,用于针对每个样本视频,若该样本视频的第三播放完成度大于第三播放完成度阈值,则获取该样本视频对应的样本内容文本,将样本内容文本添加至训练样本集;The fourth acquisition module 37 is used for each sample video, if the third playback completion degree of the sample video is greater than the third playback completion degree threshold, obtain the sample content text corresponding to the sample video, and add the sample content text to the training sample set;
第七确定模块38,用于确定对象标签样本文本序列以及该样本视频的样本内容文本之间存在第三正向关联关系,将第三正向关联关系确定为该样本视频的描述文本质量标签。The seventh determination module 38 is used to determine that there is a third positive correlation relationship between the object label sample text sequence and the sample content text of the sample video, and determine the third positive correlation relationship as the description text quality label of the sample video.
其中,第三获取模块36、第四获取模块37以及第七确定模块38的具体功能实现方式可以参见上述图9对应实施例中的步骤S301,这里不再进行赘述。For the specific functional implementation of the third acquisition module 36, the fourth acquisition module 37 and the seventh determination module 38, please refer to step S301 in the corresponding embodiment of FIG. 9, and will not be described again here.
再请参见图13,视频识别模型包括用于确定第一预测质量的第一视频识别子模型、用于确定第二预测质量的第二视频识别子模型,以及用于确定第三预测质量的第三视频识别子模型;视频识别模型中的参数包括第一视频识别子模型中的参数、第二视频识别子模型中的参数,以及第三视频识别子模型中的参数;Referring again to FIG. 13, the video recognition model includes a first video recognition sub-model for determining the first prediction quality, a second video recognition sub-model for determining the second prediction quality, and a third video recognition sub-model for determining the third prediction quality. Three video recognition sub-models; the parameters in the video recognition model include parameters in the first video recognition sub-model, parameters in the second video recognition sub-model, and parameters in the third video recognition sub-model;
参数调整模块24可以包括:第一调整单元241、第二调整单元242、第三调整单元243以及模型生成单元244。The parameter adjustment module 24 may include: a first adjustment unit 241, a second adjustment unit 242, a third adjustment unit 243, and a model generation unit 244.
第一调整单元241,用于确定第一质量标签以及第一预测质量之间的第一质量损失值, 根据第一质量损失值,对第一视频识别子模型中的参数进行调整,得到训练后的第一视频识别子模型;The first adjustment unit 241 is used to determine the first quality loss value between the first quality label and the first predicted quality, Adjust the parameters in the first video recognition sub-model according to the first quality loss value to obtain the trained first video recognition sub-model;
第二调整单元242,用于确定第二质量标签以及第二预测质量之间的第二质量损失值,根据第二质量损失值,对第二视频识别子模型中的参数进行调整,得到训练后的第二视频识别子模型;The second adjustment unit 242 is used to determine the second quality loss value between the second quality label and the second prediction quality, and adjust the parameters in the second video recognition sub-model according to the second quality loss value to obtain the trained The second video recognition sub-model;
第三调整单元243,用于确定第三质量标签以及第三预测质量之间的第三质量损失值,根据第三质量损失值,对第三视频识别子模型中的参数进行调整,得到训练后的第三视频识别子模型;The third adjustment unit 243 is used to determine the third quality loss value between the third quality label and the third prediction quality, and adjust the parameters in the third video recognition sub-model according to the third quality loss value to obtain the trained The third video recognition sub-model;
模型生成单元244,用于当第一视频识别子模型、第二视频识别子模型以及第三视频识别子模型均满足模型收敛条件时,生成包含训练后的第一视频识别子模型、训练后的第二视频识别子模型以及训练后的第三视频识别子模型的训练后的视频识别模型。The model generation unit 244 is configured to generate, when the first video recognition sub-model, the second video recognition sub-model and the third video recognition sub-model all meet the model convergence conditions, the trained first video recognition sub-model, the trained The second video recognition sub-model and the trained video recognition model of the trained third video recognition sub-model.
其中,第一调整单元241、第二调整单元242、第三调整单元243以及模型生成单元244的具体功能实现方式可以参见上述图9对应实施例中的步骤S304,这里不再进行赘述。For the specific functional implementation of the first adjustment unit 241, the second adjustment unit 242, the third adjustment unit 243 and the model generation unit 244, please refer to step S304 in the corresponding embodiment of FIG. 9, which will not be described again here.
本申请实施例通过第一训练样本集对第一视频识别子模型进行深度建模,以使第一视频识别子模型可以在多个视频片段中确定具备高共享价值的候选视频片段,通过第二训练样本集对第二视频识别子模型进行深度建模,以使第二视频识别子模型可以在候选视频片段中确定具备高共享价值的候选共享视频片段,通过第三训练样本集对第三视频识别子模型进行深度建模,以使第三视频识别子模型可以确定候选共享视频片段对应的第三共享质量以及辅助描述信息,进而可以通过不同维度的共享质量,确定共享视频片段以及其对应的辅助描述信息,进而可以生成共享数据,由于共享数据不仅与共享视频片段自身的视频内容相关联,还与对象标签文本序列相关联,故通过共享数据,可以提高视频的共享效率以及共享效果。The embodiment of the present application performs in-depth modeling on the first video recognition sub-model through the first training sample set, so that the first video recognition sub-model can determine candidate video clips with high sharing value among multiple video clips, and through the second The training sample set conducts in-depth modeling of the second video recognition sub-model, so that the second video recognition sub-model can determine candidate shared video segments with high sharing value among the candidate video segments, and the third video recognition sub-model is trained through the third training sample set. The recognition sub-model performs in-depth modeling so that the third video recognition sub-model can determine the third sharing quality and auxiliary description information corresponding to the candidate shared video clips, and then determine the shared video clips and their corresponding Auxiliary description information can then be used to generate shared data. Since the shared data is not only associated with the video content of the shared video clip itself, but also associated with the object tag text sequence, sharing the data can improve the sharing efficiency and sharing effect of the video.
进一步地,请参见图14,图14是本申请实施例提供的一种计算机设备的结构示意图。如图14所示,该计算机设备1000可以包括:至少一个处理器1001,例如CPU,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,在一些实施例中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图14所示,作为一种计算机存储介质的存储器1005可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。 Further, please refer to FIG. 14 , which is a schematic structural diagram of a computer device provided by an embodiment of the present application. As shown in Figure 14, the computer device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, a memory 1005, and at least one communication bus 1002. Among them, the communication bus 1002 is used to realize connection communication between these components. In some embodiments, the user interface 1003 may include a display and a keyboard, and the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001. As shown in Figure 14, memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
在图14所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现上述各实施例所述的视频处理方法。In the computer device 1000 shown in Figure 14, the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the video processing methods described in the above embodiments.
应当理解,本申请实施例中所描述的计算机设备1000可执行前文各实施例中对数据处理方法或装置的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。It should be understood that the computer device 1000 described in the embodiments of the present application can execute the data processing methods or devices described in the previous embodiments, which will not be described again here. In addition, the description of the beneficial effects of using the same method will not be described again.
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现前文各实施例中对数据处理方法或装置的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the description of the data processing method or device in the previous embodiments is implemented. Herein No longer. In addition, the description of the beneficial effects of using the same method will not be described again.
上述计算机可读存储介质可以是前述任一实施例提供的数据处理装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The above-mentioned computer-readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned computer equipment, such as the hard disk or memory of the computer equipment. The computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the computer device, Flash card, etc. Further, the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得该计算机设备可执行前文各实施例中对数据处理方法或装置的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。An embodiment of the present application also provides a computer program product. The computer program product includes a computer program, and the computer program is stored in a computer-readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device can execute the description of the data processing method or device in the previous embodiments, which will not be described again here. In addition, the description of the beneficial effects of using the same method will not be described again.
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可以还包括没有列出的步骤或模块,或还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。The terms “first”, “second”, etc. in the description, claims, and drawings of the embodiments of this application are used to distinguish different objects, rather than describing a specific sequence. Furthermore, the term "includes" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, device, product or equipment that includes a series of steps or units is not limited to the listed steps or modules, but may also include unlisted steps or modules, or may also include additional steps for these processes, methods. , devices, products or other step units inherent in the equipment.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应 认为超出本申请的范围。Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the relationship between hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Technicians may implement the described functionality using different methods for each specific application, but such implementations should not considered beyond the scope of this application.
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。 What is disclosed above is only the preferred embodiment of the present application. Of course, it cannot be used to limit the scope of rights of the present application. Therefore, equivalent changes made according to the claims of the present application still fall within the scope of the present application.

Claims (20)

  1. 一种数据处理方法,由计算机设备执行,包括:A data processing method, performed by a computer device, comprising:
    获取视频中的至少两个视频片段,确定所述至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从所述至少两个视频片段中选择至少一个视频片段作为候选视频片段;Obtain at least two video segments in the video, determine first sharing qualities respectively corresponding to the at least two video segments, and select at least one video segment from the at least two video segments as a candidate based on the first sharing quality. video clips;
    获取与所述视频相关联的对象标签文本序列,所述对象标签文本序列包括分享所述视频的浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣;Obtain an object tag text sequence associated with the video, the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; the object tag text of the browsing object Used to characterize the interest of the browsing object, and the object tag text of the shared object is used to characterize the interest of the shared object;
    根据所述对象标签文本序列以及所述候选视频片段,确定各候选视频片段对应的第二共享质量,根据所述各候选视频片段对应的第二共享质量,从所述候选视频片段中选择至少一个候选视频片段作为候选共享视频片段;所述第二共享质量用于表征所述候选视频片段与所述共享对象的对象标签文本的相关性;Determine a second sharing quality corresponding to each candidate video segment according to the object label text sequence and the candidate video segment, and select at least one from the candidate video segment according to the second sharing quality corresponding to each candidate video segment. The candidate video segment is used as a candidate shared video segment; the second sharing quality is used to characterize the correlation between the candidate video segment and the object label text of the shared object;
    根据所述对象标签文本序列以及所述候选共享视频片段,确定各候选共享视频片段对应的第三共享质量,以及各候选共享视频片段对应的辅助描述信息;所述第三共享质量用于表征所述辅助描述信息与所述候选共享视频片段以及共享对象的对象标签文本的匹配度;According to the object label text sequence and the candidate shared video clips, the third sharing quality corresponding to each candidate shared video clip is determined, and the auxiliary description information corresponding to each candidate shared video clip is determined; the third sharing quality is used to characterize the candidate shared video clips. The matching degree between the auxiliary description information and the candidate shared video clip and the object tag text of the shared object;
    根据各候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从所述候选共享视频片段中确定共享视频片段,将所述共享视频片段以及所述共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。According to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to each candidate shared video segment, a shared video segment is determined from the candidate shared video segment, and the shared video segment and the shared video segment are corresponding to each other. The auxiliary description information is determined as shared data sent to the shared object.
  2. 根据权利要求1所述的方法,其中,所述确定所述至少两个视频片段分别对应的第一共享质量,包括:The method of claim 1, wherein determining the first sharing quality respectively corresponding to the at least two video clips includes:
    针对所述至少两个视频片段中的每个视频片段,执行以下操作,以确定该视频片段对应的第一共享质量:For each video clip in the at least two video clips, perform the following operations to determine the first sharing quality corresponding to the video clip:
    从所述视频片段中获取K个视频帧以及所述K个视频帧分别对应的音频帧;K为正整数;Obtain K video frames and audio frames corresponding to the K video frames from the video clips; K is a positive integer;
    对所述K个视频帧分别对应的视频特征进行融合,得到该视频片段的视频特征;Fusion of video features corresponding to the K video frames to obtain video features of the video clip;
    对所述K个音频帧分别对应的音频特征进行融合,得到该视频片段的音频特征;Fusion of the audio features corresponding to the K audio frames to obtain the audio features of the video clip;
    根据所述视频片段的音频识别文本、视频描述文本以及对象评论文本,得到所述视频片段对应的文本特征; Obtain the text features corresponding to the video clip according to the audio recognition text, video description text and object comment text of the video clip;
    对所述视频片段的视频特征、音频特征以及所述文本特征进行融合,得到所述视频片段对应的多维度融合特征;Fusion of video features, audio features and text features of the video clip to obtain multi-dimensional fusion features corresponding to the video clip;
    根据所述多维度融合特征,确定所述视频片段对应的第一共享质量。According to the multi-dimensional fusion features, the first sharing quality corresponding to the video clip is determined.
  3. 根据权利要求2所述的方法,其中,The method of claim 2, wherein
    所述对所述K个视频帧分别对应的视频特征进行融合,得到视频片段的视频特征,包括:将所述K个视频帧分别输入至视频识别模型,通过所述视频识别模型的视频融合网络层,对所述K个视频帧分别进行特征提取,得到所述K个视频帧分别对应的待融合视频特征,对K个待融合视频特征进行特征融合,得到所述视频片段对应的视频特征;所述视频识别模型包括第一视频识别子模型;所述第一视频识别子模型包括视频融合网络层、音频融合网络层、文本融合网络层以及多维度融合网络层;The fusion of the video features corresponding to the K video frames to obtain the video features of the video clip includes: inputting the K video frames to the video recognition model respectively, and passing the video fusion network of the video recognition model layer, perform feature extraction on the K video frames respectively to obtain the video features to be fused corresponding to the K video frames, perform feature fusion on the K video features to be fused, and obtain the video features corresponding to the video clips; The video recognition model includes a first video recognition sub-model; the first video recognition sub-model includes a video fusion network layer, an audio fusion network layer, a text fusion network layer and a multi-dimensional fusion network layer;
    所述对所述K个音频帧分别对应的音频特征进行融合,得到所述视频片段的音频特征,包括:将所述K个音频帧分别输入至所述音频融合网络层,通过所述音频融合网络层,对所述K个音频帧分别进行特征提取,得到所述K个音频帧分别对应的待融合音频特征,对K个待融合音频特征进行特征融合,得到所述视频片段对应的音频特征;Fusion of the audio features corresponding to the K audio frames to obtain the audio features of the video clip includes: inputting the K audio frames to the audio fusion network layer respectively, and through the audio fusion The network layer performs feature extraction on the K audio frames respectively to obtain the audio features to be fused corresponding to the K audio frames, and performs feature fusion on the K audio features to be fused to obtain the audio features corresponding to the video clips. ;
    所述根据所述音频识别文本、所述视频描述文本以及所述对象评论文本,得到所述视频片段对应的文本特征,包括:将所述音频识别文本、所述视频描述文本以及所述对象评论文本,确定为所述视频片段对应的内容文本,将所述内容文本输入至所述文本融合网络层,通过所述文本融合网络层,提取所述内容文本中的关键文本,对所述关键文本进行特征提取,得到所述关键文本对应的文本特征;Obtaining text features corresponding to the video clip based on the audio recognition text, the video description text and the object comment text includes: combining the audio recognition text, the video description text and the object comment The text is determined to be the content text corresponding to the video clip, the content text is input to the text fusion network layer, and the key text in the content text is extracted through the text fusion network layer, and the key text is Perform feature extraction to obtain text features corresponding to the key text;
    所述对所述视频片段的视频特征、音频特征以及所述文本特征进行融合,得到所述视频片段对应的多维度融合特征,包括:将所述视频特征、所述音频特征以及所述文本特征分别输入至所述多维度融合网络层,通过所述多维度融合网络层,对所述视频特征、所述音频特征以及所述文本特征进行特征融合,得到所述视频片段对应的多维度融合特征。The fusion of video features, audio features and text features of the video clip to obtain multi-dimensional fusion features corresponding to the video clip includes: combining the video features, the audio features and the text features They are respectively input to the multi-dimensional fusion network layer. Through the multi-dimensional fusion network layer, the video features, the audio features and the text features are feature fused to obtain the multi-dimensional fusion features corresponding to the video clips. .
  4. 根据权利要求2所述的方法,其中,所述根据所述多维度融合特征,确定所述视频片段的第一共享质量,包括:The method according to claim 2, wherein determining the first sharing quality of the video clip according to the multi-dimensional fusion feature includes:
    将所述视频片段对应的多维度融合特征输入至视频识别模型,通过所述视频识别模型的第一全连接网络层,对所述视频片段对应的多维度融合特征进行特征变换,得到所述视频片段对应的第一共享质量;所述视频识别模型包括第一视频识别子模型;所述第一视频识别子模型包括所述第一全连接网络层; The multi-dimensional fusion features corresponding to the video clips are input into the video recognition model, and feature transformation is performed on the multi-dimensional fusion features corresponding to the video clips through the first fully connected network layer of the video recognition model to obtain the video The first shared quality corresponding to the segment; the video recognition model includes a first video recognition sub-model; the first video recognition sub-model includes the first fully connected network layer;
    所述根据所述第一共享质量,从所述至少两个视频片段中选择至少一个视频片段作为候选视频片段,包括:Selecting at least one video segment from the at least two video segments as a candidate video segment according to the first sharing quality includes:
    将所述至少两个视频片段中,第一共享质量等于或大于第一共享质量阈值的视频片段,确定为所述候选视频片段。Among the at least two video segments, the video segment whose first shared quality is equal to or greater than the first shared quality threshold is determined as the candidate video segment.
  5. 根据权利要求1所述的方法,其中,所述获取与所述视频相关联的对象标签文本序列,包括:The method of claim 1, wherein said obtaining an object label text sequence associated with the video includes:
    获取与所述视频相关联的浏览对象的对象标签文本,获取与所述浏览对象相关联的所述共享对象的对象标签文本;Obtain the object tag text of the browsing object associated with the video, and obtain the object tag text of the shared object associated with the browsing object;
    根据所述浏览对象的对象标签文本以及所述共享对象的对象标签文本,生成所述对象标签文本序列;Generate the object tag text sequence according to the object tag text of the browsing object and the object tag text of the shared object;
    所述根据所述对象标签文本序列以及所述候选视频片段,确定每个候选视频片段对应的第二共享质量,包括:Determining the second sharing quality corresponding to each candidate video segment according to the object label text sequence and the candidate video segment includes:
    针对每个候选视频片段,执行以下操作,以确定该候选视频片段对应的第二共享质量:For each candidate video clip, perform the following operations to determine the second sharing quality corresponding to the candidate video clip:
    将所述对象标签文本序列以及所述候选视频片段分别输入至视频识别模型;所述视频识别模型包括第二视频识别子模型;所述第二视频识别子模型包括第一文本编码网络层;The object label text sequence and the candidate video segment are respectively input to a video recognition model; the video recognition model includes a second video recognition sub-model; the second video recognition sub-model includes a first text encoding network layer;
    通过所述第一文本编码网络层,对所述对象标签文本序列中的每个对象标签文本进行文本编码,得到所述对象标签文本序列对应的第一对象标签特征;Through the first text encoding network layer, text encoding is performed on each object label text in the object label text sequence to obtain the first object label feature corresponding to the object label text sequence;
    获取所述候选视频片段对应的多维度融合特征,根据所述第一对象标签特征以及所述候选视频片段对应的多维度融合特征,确定所述候选视频片段对应的第二共享质量。Multi-dimensional fusion features corresponding to the candidate video segments are obtained, and second sharing quality corresponding to the candidate video segments is determined based on the first object label features and the multi-dimensional fusion features corresponding to the candidate video segments.
  6. 根据权利要求5所述的方法,其中,所述第二视频识别子模型还包括第一拼接网络层以及第二全连接网络层;The method according to claim 5, wherein the second video recognition sub-model further includes a first splicing network layer and a second fully connected network layer;
    所述根据所述第一对象标签特征以及所述候选视频片段对应的多维度融合特征,确定所述候选视频片段对应的第二共享质量,包括:Determining the second sharing quality corresponding to the candidate video segment based on the first object label feature and the multi-dimensional fusion feature corresponding to the candidate video segment includes:
    将所述第一对象标签特征以及所述候选视频片段对应的多维度融合特征分别输入至所述第一拼接网络层;Input the first object label feature and the multi-dimensional fusion feature corresponding to the candidate video clip to the first splicing network layer respectively;
    通过所述第一拼接网络层,对所述第一对象标签特征以及所述候选视频片段对应的多维度融合特征进行特征拼接,得到所述候选视频片段对应的第一多维度拼接特征; Through the first splicing network layer, feature splicing is performed on the first object label feature and the multi-dimensional fusion feature corresponding to the candidate video clip, to obtain the first multi-dimensional splicing feature corresponding to the candidate video clip;
    将所述第一多维度拼接特征输入至所述第二全连接网络层,通过所述第二全连接网络层,对所述第一多维度拼接特征进行特征变换,得到所述候选视频片段对应的第二共享质量;Input the first multi-dimensional splicing feature into the second fully connected network layer, and perform feature transformation on the first multi-dimensional splicing feature through the second fully connected network layer to obtain the corresponding candidate video clip the second shared quality;
    其中,所述候选视频片段的数量为至少两个;Wherein, the number of candidate video segments is at least two;
    则所述根据每个候选视频片段对应的第二共享质量,从所述候选视频片段中选择至少一个候选视频片段作为候选共享视频片段,包括:Then, selecting at least one candidate video segment from the candidate video segments as a candidate shared video segment according to the second sharing quality corresponding to each candidate video segment includes:
    将所述至少两个候选视频片段中,第二共享质量大于所述第二共享质量阈值的候选视频片段,确定为候选共享视频片段。Among the at least two candidate video segments, a candidate video segment whose second shared quality is greater than the second shared quality threshold is determined as a candidate shared video segment.
  7. 根据权利要求1所述的方法,其中,所述候选共享视频片段对应的辅助描述信息包括所述候选共享视频片段对应的描述图像,以及所述候选共享视频片段对应的描述文本;所述候选共享视频片段对应的第三共享质量包括所述描述图像对应的图像共享质量,以及所述描述文本对应的文本共享质量;The method of claim 1, wherein the auxiliary description information corresponding to the candidate shared video clip includes a description image corresponding to the candidate shared video clip, and a description text corresponding to the candidate shared video clip; the candidate shared video clip corresponds to a description image corresponding to the candidate shared video clip. The third sharing quality corresponding to the video clip includes the image sharing quality corresponding to the description image, and the text sharing quality corresponding to the description text;
    所述根据所述对象标签文本序列以及所述候选共享视频片段,确定每个候选共享视频片段对应的第三共享质量,以及每个候选共享视频片段对应的辅助描述信息,包括:Determining the third sharing quality corresponding to each candidate shared video segment and the auxiliary description information corresponding to each candidate shared video segment based on the object tag text sequence and the candidate shared video segment includes:
    针对每个候选共享视频片段:Share video clips for each candidate:
    获取所述候选共享视频片段中的至少两个视频帧,确定所述至少两个视频帧中每个视频帧对应的图像共享质量,根据所述各视频帧对应的图像共享质量,确定所述候选共享视频片段对应的图像共享质量,并从所述至少两个视频帧中选择一个视频帧作为所述候选共享视频片段对应的描述图像;Obtain at least two video frames in the candidate shared video segments, determine the image sharing quality corresponding to each video frame in the at least two video frames, and determine the candidate based on the image sharing quality corresponding to each video frame. Share the image sharing quality corresponding to the video segment, and select one video frame from the at least two video frames as the description image corresponding to the candidate shared video segment;
    根据所述对象标签文本序列以及所述候选共享视频片段对应的内容文本,确定所述候选共享视频片段对应的文本共享质量,以及所述候选共享视频片段对应的描述文本。According to the object tag text sequence and the content text corresponding to the candidate shared video clip, the text sharing quality corresponding to the candidate shared video clip and the description text corresponding to the candidate shared video clip are determined.
  8. 根据权利要求7所述的方法,其中,所述获取所述候选共享视频片段中的至少两个视频帧,确定所述至少两个视频帧中每个视频帧对应的图像共享质量,包括:The method according to claim 7, wherein said obtaining at least two video frames in the candidate shared video segments and determining the image sharing quality corresponding to each video frame in the at least two video frames includes:
    根据图像采样周期,对所述候选共享视频片段进行图像采样,得到所述候选共享视频片段中的至少两个视频帧;According to the image sampling period, perform image sampling on the candidate shared video segments to obtain at least two video frames in the candidate shared video segments;
    针对所述至少两个视频帧中的每个视频帧:For each of the at least two video frames:
    将所述视频帧输入至第三视频识别子模型,通过所述第三视频识别子模型的图像识别网络层,对所述视频帧进行特征提取,得到所述视频帧对应的共享图像特征;所 述第三视频识别模型包括第四视频识别子模型;所述第四视频识别子模型包括图像识别网络层以及第二拼接网络层;Input the video frame to the third video recognition sub-model, perform feature extraction on the video frame through the image recognition network layer of the third video recognition sub-model, and obtain the shared image features corresponding to the video frame; The third video recognition model includes a fourth video recognition sub-model; the fourth video recognition sub-model includes an image recognition network layer and a second splicing network layer;
    获取所述候选共享视频片段对应的多维度融合特征,获取所述对象标签文本序列对应的第二对象标签特征;其中,所述第二对象标签特征是对所述对象标签文本序列进行文本编码得到的;Obtain multi-dimensional fusion features corresponding to the candidate shared video clips, and obtain second object label features corresponding to the object label text sequence; wherein the second object label feature is obtained by text encoding the object label text sequence. of;
    将所述视频帧对应的共享图像特征、所述候选共享视频片段对应的多维度融合特征以及所述第二对象标签特征分别输入至所述第二拼接网络层;Input the shared image features corresponding to the video frames, the multi-dimensional fusion features corresponding to the candidate shared video segments, and the second object label features to the second splicing network layer respectively;
    通过所述第二拼接网络层,对所述视频帧对应的共享图像特征、所述候选共享视频片段对应的多维度融合特征以及所述第二对象标签特征进行特征拼接,得到所述视频帧对应的第二多维度拼接特征;Through the second splicing network layer, feature splicing is performed on the shared image features corresponding to the video frames, the multi-dimensional fusion features corresponding to the candidate shared video segments, and the second object label features to obtain the corresponding video frames. The second multi-dimensional splicing feature;
    根据所述视频帧对应的第二多维度拼接特征,确定所述视频帧对应的图像共享质量。The image sharing quality corresponding to the video frame is determined according to the second multi-dimensional splicing feature corresponding to the video frame.
  9. 根据权利要求7所述的方法,其中,所述描述文本是由N个共享词所组成的;The method according to claim 7, wherein the description text is composed of N shared words;
    所述根据所述对象标签文本序列以及所述候选共享视频片段对应的内容文本,确定所述候选共享视频片段对应的文本共享质量,以及所述候选共享视频片段对应的描述文本,包括:Determining the text sharing quality corresponding to the candidate shared video clip and the description text corresponding to the candidate shared video clip based on the object tag text sequence and the content text corresponding to the candidate shared video clip include:
    将所述候选共享视频片段对应的内容文本输入至第三视频识别子模型,通过所述第三视频识别子模型的第二文本编码网络层,对所述候选共享视频片段对应的内容文本进行文本编码,得到内容文本特征;所述第三视频识别子模型包括第五视频识别子模型;所述第五视频识别子模型包括第二文本编码网络层、第三文本编码网络层、注意力网络层以及文本解码网络层;The content text corresponding to the candidate shared video clip is input to the third video recognition sub-model, and the content text corresponding to the candidate shared video clip is text-coded through the second text encoding network layer of the third video recognition sub-model. Encoding to obtain content text features; the third video recognition sub-model includes a fifth video recognition sub-model; the fifth video recognition sub-model includes a second text encoding network layer, a third text encoding network layer, and an attention network layer and text decoding network layer;
    将所述对象标签文本序列输入至所述第三文本编码网络层,通过所述第三文本编码网络层,对所述对象标签文本序列进行文本编码,得到第三对象标签特征;Input the object label text sequence into the third text encoding network layer, perform text encoding on the object label text sequence through the third text encoding network layer, and obtain the third object label feature;
    将所述内容文本特征、所述候选共享视频片段对应的待解码文本特征Si以及所述第三对象标签特征分别输入至所述注意力网络层,通过所述注意力网络层,对所述内容文本特征、所述待解码文本特征Si以及所述第三对象标签特征进行特征融合,得到所述内容文本特征对应的注意力权重;i为小于N的非负整数;The content text features, the to-be-decoded text features Si corresponding to the candidate shared video segments, and the third object label features are respectively input to the attention network layer. Through the attention network layer, the The content text features, the to-be-decoded text features S i and the third object label features are feature fused to obtain the attention weight corresponding to the content text features; i is a non-negative integer less than N;
    根据所述内容文本特征对应的注意力权重,确定所述候选共享视频片段对应的待解码文本特征Si+1;所述待解码文本特征Si所指示的共享词为所述待解码文本特征Si+1所指示的共享词的上一个共享词; According to the attention weight corresponding to the content text feature, the to-be-decoded text feature Si +1 corresponding to the candidate shared video segment is determined; the shared word indicated by the to-be-decoded text feature Si is the to-be-decoded text feature The previous shared word of the shared word indicated by S i+1 ;
    当i+1等于N时,将N个待解码文本特征分别输入至所述文本解码网络层,通过所述文本解码网络层,生成所述N个待解码文本特征分别指示的共享词,将所述N个共享词组成为所述候选共享视频片段对应的描述文本;When i+1 is equal to N, N text features to be decoded are respectively input to the text decoding network layer, and shared words indicated by the N text features to be decoded are generated through the text decoding network layer, and all the text features to be decoded are generated. The N shared words constitute the description text corresponding to the candidate shared video clip;
    根据所述N个待解码文本特征,生成所述候选共享视频片段对应的文本共享质量。According to the N text features to be decoded, a text sharing quality corresponding to the candidate shared video clip is generated.
  10. 根据权利要求1所述的方法,其中,The method of claim 1, wherein,
    所述根据所述候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从所述候选共享视频片段中确定共享视频片段,包括:Determining shared video segments from the candidate shared video segments based on the first sharing quality, the second sharing quality, and the third sharing quality corresponding to the candidate shared video segments includes:
    针对每个候选共享视频片段:将该候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量进行加权求和,得到所述候选共享视频片段对应的总共享质量;For each candidate shared video segment: perform a weighted sum of the first shared quality, the second shared quality, and the third shared quality corresponding to the candidate shared video segment to obtain the total shared quality corresponding to the candidate shared video segment;
    将所述候选共享视频片段中,总共享质量最大的候选共享视频片段确定为共享视频片段。Among the candidate shared video segments, the candidate shared video segment with the largest total shared quality is determined as the shared video segment.
  11. 根据权利要求1所述的方法,进一步包括:The method of claim 1, further comprising:
    获取训练样本集;所述训练样本集包括多个样本视频、与每个样本视频相关联的浏览样本对象的对象标签样本文本序列、每个样本视频对应的第一质量标签、第二质量标签以及第三质量标签;Obtain a training sample set; the training sample set includes a plurality of sample videos, an object label sample text sequence of the browse sample object associated with each sample video, a first quality label, a second quality label corresponding to each sample video, and third quality label;
    将所述训练样本集输入至视频识别模型,通过所述视频识别模型,确定各样本视频对应的第一预测质量;Input the training sample set to the video recognition model, and determine the first prediction quality corresponding to each sample video through the video recognition model;
    根据所述对象标签样本文本序列以及所述多个样本视频,分别确定各样本视频对应的第二预测质量以及第三预测质量;According to the object label sample text sequence and the plurality of sample videos, determine the second prediction quality and the third prediction quality corresponding to each sample video respectively;
    根据所述第一质量标签、所述第二质量标签、所述第三质量标签、所述第一预测质量、所述第二预测质量以及所述第三预测质量,对所述视频识别模型中的参数进行调整,得到训练后的视频识别模型;所述训练后的视频识别模型用于确定所述视频的共享数据。According to the first quality label, the second quality label, the third quality label, the first prediction quality, the second prediction quality and the third prediction quality, the video recognition model The parameters are adjusted to obtain a trained video recognition model; the trained video recognition model is used to determine the shared data of the video.
  12. 根据权利要求11所述的方法,进一步包括:The method of claim 11, further comprising:
    针对所述多个样本视频中的每个样本视频,执行以下操作,以确定所述样本视频对应的第一质量标签:For each sample video in the plurality of sample videos, perform the following operations to determine the first quality label corresponding to the sample video:
    对所述样本视频对应的播放次数、时长以及平均播放完成度进行乘积运算,得到所述样本视频对应的第一样本参数;Perform a product operation on the playback times, duration and average playback completion degree corresponding to the sample video to obtain the first sample parameter corresponding to the sample video;
    对所述样本视频对应的对象评论文本数量以及对象评论文本互动数量进行求和运 算,得到所述样本视频对应的第二样本参数;Perform a summation operation on the number of object comment texts corresponding to the sample video and the number of object comment text interactions. Calculate to obtain the second sample parameter corresponding to the sample video;
    确定所述样本视频对应的第一样本参数以及第一样本参数最大值之间的第一比例,确定所述样本视频对应的第二样本参数以及第二样本参数最大值之间的第二比例;所述第一样本参数最大值为所述多个样本视频对应的第一样本参数中取值最大的一个;所述第二样本参数最大值为所述多个样本视频对应的第二样本参数取值最大的一个;Determine a first ratio between the first sample parameter corresponding to the sample video and the maximum value of the first sample parameter, and determine a second ratio between the second sample parameter corresponding to the sample video and the maximum value of the second sample parameter. Ratio; the maximum value of the first sample parameter is the largest value among the first sample parameters corresponding to the plurality of sample videos; the maximum value of the second sample parameter is the largest value of the first sample parameter corresponding to the plurality of sample videos. The one with the largest value of the two sample parameters;
    对所述第一比例以及所述第二比例进行加权求和,得到所述样本视频对应的候选第一质量标签;Perform a weighted sum of the first ratio and the second ratio to obtain a candidate first quality label corresponding to the sample video;
    若所述样本视频对应的候选第一质量标签小于第一质量标签阈值,则将所述样本视频对应的候选第一质量标签确定为所述样本视频对应的第一质量标签;If the candidate first quality label corresponding to the sample video is less than the first quality label threshold, determine the candidate first quality label corresponding to the sample video as the first quality label corresponding to the sample video;
    若所述样本视频对应的候选第一质量标签等于或大于所述第一质量标签阈值,则将所述第一质量标签阈值确定为所述样本视频对应的第一质量标签。If the candidate first quality label corresponding to the sample video is equal to or greater than the first quality label threshold, the first quality label threshold is determined as the first quality label corresponding to the sample video.
  13. 根据权利要求11所述的方法,进一步包括:The method of claim 11, further comprising:
    针对每个样本视频:For each sample video:
    获取所述浏览样本对象针对所述样本视频的第一播放完成度;Obtain the first playback completion degree of the browse sample object for the sample video;
    若所述第一播放完成度大于第一播放完成度阈值,则确定所述对象标签样本文本以及所述样本视频之间存在第一正向关联关系,将所述第一正向关联关系确定为所述样本视频的第二质量标签;If the first playback completion degree is greater than the first playback completion degree threshold, it is determined that there is a first positive association relationship between the object label sample text and the sample video, and the first forward association relationship is determined as The second quality label of the sample video;
    若所述第一播放完成度小于或等于所述第一播放完成度阈值,则确定所述对象标签样本文本以及所述样本视频之间存在第一反向关联关系,将所述第一反向关联关系确定为所述样本视频的第二质量标签。If the first playback completion degree is less than or equal to the first playback completion degree threshold, it is determined that there is a first reverse association relationship between the object label sample text and the sample video, and the first reverse association relationship is The association relationship is determined as the second quality label of the sample video.
  14. 根据权利要求11所述的方法,其中,所述训练样本集还包括每个样本视频对应的样本描述图像;所述第三质量标签包括描述图像质量标签;The method according to claim 11, wherein the training sample set further includes a sample description image corresponding to each sample video; the third quality label includes a description image quality label;
    所述方法还包括:The method also includes:
    针对每个样本视频:For each sample video:
    获取所述浏览样本对象针对所述样本视频的第二播放完成度;Obtain the second playback completion degree of the browse sample object for the sample video;
    若所述第二播放完成度大于第二播放完成度阈值,则确定所述样本视频对应的样本描述图像、所述对象标签样本文本以及所述样本视频之间存在第二正向关联关系,将所述第二正向关联关系确定为所述样本视频的描述图像质量标签;If the second playback completion degree is greater than the second playback completion degree threshold, it is determined that there is a second positive correlation between the sample description image corresponding to the sample video, the object label sample text and the sample video, and the The second positive correlation relationship is determined as the descriptive image quality label of the sample video;
    若所述第二播放完成度小于或等于所述第二播放完成度阈值,则确定所述样本视频对 应的样本描述图像、所述对象标签样本文本以及所述样本视频之间存在第二反向关联关系,将所述第二反向关联关系确定为所述样本视频的描述图像质量标签。If the second playback completion degree is less than or equal to the second playback completion degree threshold, it is determined that the sample video pair There is a second reverse correlation relationship between the corresponding sample description image, the object label sample text and the sample video, and the second reverse correlation relationship is determined as the description image quality label of the sample video.
  15. 根据权利要求11所述的方法,其中,所述第三质量标签包括描述文本质量标签;The method of claim 11, wherein the third quality label includes a descriptive text quality label;
    所述方法还包括:The method also includes:
    针对每个样本视频:For each sample video:
    获取所述浏览样本对象针对所述样本视频的第三播放完成度;Obtain the third playback completion degree of the browsing sample object for the sample video;
    若所述第三播放完成度大于第三播放完成度阈值,则获取所述样本视频对应的样本内容文本,将所述样本内容文本添加至训练样本集;If the third playback completion degree is greater than the third playback completion degree threshold, obtain the sample content text corresponding to the sample video, and add the sample content text to the training sample set;
    确定所述对象标签样本文本序列以及所述样本内容文本之间存在第三正向关联关系,将所述第三正向关联关系确定为所述样本视频的描述文本质量标签。It is determined that there is a third positive correlation relationship between the object label sample text sequence and the sample content text, and the third positive correlation relationship is determined as the description text quality label of the sample video.
  16. 根据权利要求11所述的方法,其中,所述视频识别模型包括用于确定所述第一预测质量的第一视频识别子模型、用于确定所述第二预测质量的第二视频识别子模型,以及用于确定所述第三预测质量的第三视频识别子模型;所述视频识别模型中的参数包括所述第一视频识别子模型中的参数、所述第二视频识别子模型中的参数,以及所述第三视频识别子模型中的参数;The method of claim 11, wherein the video recognition model includes a first video recognition sub-model for determining the first prediction quality, a second video recognition sub-model for determining the second prediction quality. , and a third video recognition sub-model for determining the third prediction quality; the parameters in the video recognition model include parameters in the first video recognition sub-model, parameters in the second video recognition sub-model Parameters, as well as parameters in the third video recognition sub-model;
    所述根据所述第一质量标签、所述第二质量标签、所述第三质量标签、所述第一预测质量、所述第二预测质量以及所述第三预测质量,对所述视频识别模型中的参数进行调整,得到训练后的视频识别模型,包括:identifying the video according to the first quality label, the second quality label, the third quality label, the first prediction quality, the second prediction quality and the third prediction quality. The parameters in the model are adjusted to obtain the trained video recognition model, including:
    确定所述第一质量标签以及所述第一预测质量之间的第一质量损失值,根据所述第一质量损失值,对所述第一视频识别子模型中的参数进行调整,得到训练后的第一视频识别子模型;Determine a first quality loss value between the first quality label and the first predicted quality, adjust parameters in the first video recognition sub-model according to the first quality loss value, and obtain the trained The first video recognition sub-model;
    确定所述第二质量标签以及所述第二预测质量之间的第二质量损失值,根据所述第二质量损失值,对所述第二视频识别子模型中的参数进行调整,得到训练后的第二视频识别子模型;Determine a second quality loss value between the second quality label and the second predicted quality, adjust parameters in the second video recognition sub-model according to the second quality loss value, and obtain the trained The second video recognition sub-model;
    确定所述第三质量标签以及所述第三预测质量之间的第三质量损失值,根据所述第三质量损失值,对所述第三视频识别子模型中的参数进行调整,得到训练后的第三视频识别子模型;Determine a third quality loss value between the third quality label and the third predicted quality, adjust parameters in the third video recognition sub-model according to the third quality loss value, and obtain the trained The third video recognition sub-model;
    当所述第一视频识别子模型、所述第二视频识别子模型以及所述第三视频识别子模型均满足模型收敛条件时,生成包含所述训练后的第一视频识别子模型、所述训练后的第二 视频识别子模型以及所述训练后的第三视频识别子模型的训练后的视频识别模型。When the first video recognition sub-model, the second video recognition sub-model and the third video recognition sub-model all meet the model convergence conditions, generate the trained first video recognition sub-model, the second after training The video recognition sub-model and the trained video recognition model of the trained third video recognition sub-model.
  17. 一种数据处理装置,包括:A data processing device including:
    第一获取模块,用于获取视频中的至少两个视频片段,确定所述至少两个视频片段分别对应的第一共享质量,根据所述第一共享质量,从所述至少两个视频片段中选择至少一个视频片段作为候选视频片段;The first acquisition module is used to acquire at least two video clips in the video, determine the first sharing quality corresponding to the at least two video clips, and obtain the first sharing quality from the at least two video clips according to the first sharing quality. Select at least one video segment as a candidate video segment;
    第二获取模块,用于获取与所述视频相关联的对象标签文本序列,所述对象标签文本序列包括分享所述视频的浏览对象的对象标签文本和接收分享的共享对象的对象标签文本;所述浏览对象的对象标签文本用于表征所述浏览对象的兴趣,所述共享对象的对象标签文本用于表征所述共享对象的兴趣;根据所述对象标签文本序列以及所述候选视频片段,确定每个述候选视频片段对应的第二共享质量,根据每个候选视频片段对应的第二共享质量,从所述候选视频片段中选择至少一个候选视频片段作为候选共享视频片段;所述第二共享质量用于表征所述候选视频片段与所述共享对象的对象标签文本的相关性;The second acquisition module is configured to acquire an object tag text sequence associated with the video, where the object tag text sequence includes the object tag text of the browsing object that shares the video and the object tag text of the shared object that receives the share; The object tag text of the browsing object is used to characterize the interest of the browsing object, and the object tag text of the shared object is used to characterize the interest of the shared object; according to the object tag text sequence and the candidate video clip, determine According to the second sharing quality corresponding to each candidate video segment, at least one candidate video segment is selected from the candidate video segments as a candidate shared video segment; the second sharing Quality is used to characterize the relevance of the candidate video segment to the object tag text of the shared object;
    第一确定模块,用于根据所述对象标签文本序列以及所述候选共享视频片段,确定每个候选共享视频片段对应的第三共享质量,根据每个候选共享视频片段对应的第三共享质量,确定每个候选共享视频片段对应的辅助描述信息;所述第三共享质量用于表征所述辅助描述信息与所述候选共享视频片段以及共享对象的对象标签文本的匹配度;The first determination module is configured to determine the third sharing quality corresponding to each candidate shared video segment according to the object tag text sequence and the candidate shared video segment, and according to the third sharing quality corresponding to each candidate shared video segment, Determine the auxiliary description information corresponding to each candidate shared video clip; the third sharing quality is used to characterize the matching degree of the auxiliary description information with the candidate shared video clip and the object tag text of the shared object;
    第二确定模块,用于根据每个候选共享视频片段对应的第一共享质量、第二共享质量,以及第三共享质量,从所述候选共享视频片段中确定共享视频片段,将所述共享视频片段以及所述共享视频片段对应的辅助描述信息,确定为用于发送至共享对象的共享数据。The second determination module is configured to determine shared video segments from the candidate shared video segments according to the first sharing quality, the second sharing quality, and the third sharing quality corresponding to each candidate shared video segment, and convert the shared video segment into the shared video segment. The segments and the auxiliary description information corresponding to the shared video segments are determined as shared data to be sent to the sharing object.
  18. 一种计算机设备,包括:处理器、存储器以及网络接口;A computer device including: a processor, a memory and a network interface;
    所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口用于提供数据通信功能,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述计算机设备执行权利要求1至16任一项所述的方法。The processor is connected to the memory and the network interface, wherein the network interface is used to provide data communication functions, the memory is used to store computer programs, and the processor is used to call the computer program so that The computer device performs the method of any one of claims 1 to 16.
  19. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序适于由处理器加载并执行,以使得具有所述处理器的计算机设备执行权利要求1-16任一项所述的方法。A computer-readable storage medium having a computer program stored therein, the computer program being adapted to be loaded and executed by a processor, so that a computer device having the processor executes claims 1-16 any of the methods described.
  20. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存 储在计算机可读存储介质中,当所述计算机指令被执行时,实现如权利要求1-16任一项所述的方法。 A computer program product, the computer program product includes computer instructions, the computer instructions store Stored in a computer-readable storage medium, when the computer instructions are executed, the method according to any one of claims 1-16 is implemented.
PCT/CN2023/074763 2022-04-01 2023-02-07 Data processing method, and device and computer-readable storage medium WO2023185257A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210336414.6A CN114419527B (en) 2022-04-01 2022-04-01 Data processing method, equipment and computer readable storage medium
CN202210336414.6 2022-04-01

Publications (1)

Publication Number Publication Date
WO2023185257A1 true WO2023185257A1 (en) 2023-10-05

Family

ID=81263299

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074763 WO2023185257A1 (en) 2022-04-01 2023-02-07 Data processing method, and device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN114419527B (en)
WO (1) WO2023185257A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419527B (en) * 2022-04-01 2022-06-14 腾讯科技(深圳)有限公司 Data processing method, equipment and computer readable storage medium
CN116777914B (en) * 2023-08-22 2023-11-07 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140341026A1 (en) * 2013-05-16 2014-11-20 Cisco Technology, Inc. Enhancing performance of rapid channel changes and other playback positioning changes in adaptive streaming
CN109862397A (en) * 2019-02-02 2019-06-07 广州虎牙信息科技有限公司 A kind of video analysis method, apparatus, equipment and storage medium
CN111581510A (en) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 Shared content processing method and device, computer equipment and storage medium
CN111866607A (en) * 2020-07-30 2020-10-30 腾讯科技(深圳)有限公司 Video clip positioning method and device, computer equipment and storage medium
CN114419527A (en) * 2022-04-01 2022-04-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10509825B2 (en) * 2017-07-21 2019-12-17 Fuji Xerox Co., Ltd. Systems and methods for topic guidance in video content using sequence mining
CN107888988A (en) * 2017-11-17 2018-04-06 广东小天才科技有限公司 A kind of video clipping method and electronic equipment
CN110888854A (en) * 2019-11-29 2020-03-17 维沃移动通信有限公司 Content sharing method and electronic equipment
CN113515997B (en) * 2020-12-28 2024-01-19 腾讯科技(深圳)有限公司 Video data processing method and device and readable storage medium
CN113766299B (en) * 2021-05-06 2024-04-19 腾讯科技(深圳)有限公司 Video data playing method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140341026A1 (en) * 2013-05-16 2014-11-20 Cisco Technology, Inc. Enhancing performance of rapid channel changes and other playback positioning changes in adaptive streaming
CN109862397A (en) * 2019-02-02 2019-06-07 广州虎牙信息科技有限公司 A kind of video analysis method, apparatus, equipment and storage medium
CN111581510A (en) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 Shared content processing method and device, computer equipment and storage medium
CN111866607A (en) * 2020-07-30 2020-10-30 腾讯科技(深圳)有限公司 Video clip positioning method and device, computer equipment and storage medium
CN114419527A (en) * 2022-04-01 2022-04-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN114419527A (en) 2022-04-29
CN114419527B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
WO2023185257A1 (en) Data processing method, and device and computer-readable storage medium
CN110781347B (en) Video processing method, device and equipment and readable storage medium
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
US20210365749A1 (en) Image data processing method and apparatus, electronic device, and storage medium
CN113766299B (en) Video data playing method, device, equipment and medium
CN109871736B (en) Method and device for generating natural language description information
CN112929253B (en) Virtual image interaction method and device
US20180143741A1 (en) Intelligent graphical feature generation for user content
CN111428025A (en) Text summarization method and device, electronic equipment and storage medium
WO2024046189A1 (en) Text generation method and apparatus
CN106937127B (en) Display method and system for intelligent search preparation
CN116977457A (en) Data processing method, device and computer readable storage medium
CN117009577A (en) Video data processing method, device, equipment and readable storage medium
CN116775815A (en) Dialogue data processing method and device, electronic equipment and storage medium
CN113821677A (en) Method, device and equipment for generating cover image and storage medium
CN116740540B (en) Data processing method, device, equipment and computer readable storage medium
WO2023207463A1 (en) Voting information generation method and apparatus, and voting information display method and apparatus
CN114782590B (en) Multi-object content combined image generation method and system
US20230359832A1 (en) Context sharing between physical and digital worlds
CN112434677B (en) Contract auditing method, device, equipment and storage medium
US20230328012A1 (en) Virtual-figure-based data processing method and apparatus, computer device, and storage medium
CN116974439A (en) Data processing method, device, equipment and computer readable storage medium
CN115424266A (en) Expression symbol prediction method, device, equipment and storage medium
CN116975330A (en) Content display method and device, electronic equipment and storage medium
CN116700546A (en) Electronic resource package processing method and related products

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23777636

Country of ref document: EP

Kind code of ref document: A1