CN117743636A

CN117743636A - Video analysis method, related device, equipment and storage medium

Info

Publication number: CN117743636A
Application number: CN202311378717.5A
Authority: CN
Inventors: 熊世富; 刘杰; 王勃; 张奇; 张坤; 孙兆艳; 葛涛; 丁宁; 王庆然; 戚婷; 孔常青; 高建清; 刘聪; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-03-22

Abstract

The application discloses a video analysis method, a related device, equipment and a storage medium, wherein the video analysis method comprises the following steps: searching to obtain candidate videos about the marketing target to be analyzed based on the keywords representing the target to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand; responding to a selection instruction of the candidate video, determining the selected candidate video as a target video, and determining a video fragment to be analyzed in the target video; and analyzing based on the video segments to obtain marketing summary of the target to be analyzed. According to the scheme, the degree of automation of marketing summary generation can be improved on the premise that pertinence of marketing summary is ensured as much as possible, so that the marketing summary generation efficiency is improved.

Description

Video analysis method, related device, equipment and storage medium

Technical Field

The present disclosure relates to the field of data analysis technologies, and in particular, to a video analysis method, and related apparatus, device, and storage medium.

Background

In product marketing, sales personnel often increase sales volume of products to be sold using marketing techniques for the products to be sold.

In the prior art, aiming at different products to be sold and different sales scenes, sales staff need to respectively think about marketing summaries with stronger pertinence such as marketing, thereby consuming a great deal of manpower and reducing the efficiency of generating the marketing summaries. In view of this, how to improve the automation degree of generating the marketing summary on the premise of ensuring the pertinence of the marketing summary as much as possible so as to improve the generation efficiency of the marketing summary is a problem to be solved.

Disclosure of Invention

The technical problem that this application mainly solves is to provide a video analysis method and relevant device, equipment and storage medium, can promote the degree of automation who generates marketing summary under the targeted prerequisite of guaranteeing marketing summary as far as possible to promote the generation efficiency of marketing summary.

In order to solve the above technical problem, a first aspect of the present application provides a video analysis method, including: searching to obtain candidate videos about the marketing target to be analyzed based on the keywords representing the target to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand; responding to a selection instruction of the candidate video, determining the selected candidate video as a target video, and determining a video fragment to be analyzed in the target video; and analyzing based on the video segments to obtain marketing summary of the target to be analyzed.

In order to solve the above technical problem, a second aspect of the present application provides a video analysis device, including: the system comprises a retrieval module, a determination module and an analysis module, wherein the retrieval module is used for retrieving candidate videos about marketing analysis targets based on keywords representing the targets to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand; the determining module is used for responding to a selection instruction of the candidate video, determining the selected candidate video as a target video and determining a video fragment to be analyzed in the target video; and the analysis module is used for analyzing based on the video clips to obtain marketing summary of the target to be analyzed.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the video analysis method in the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer readable storage medium storing program instructions executable by a processor, where the program instructions are configured to implement the video analysis method of the first aspect.

According to the scheme, the keywords of at least one of the target product and the target brand to be analyzed are obtained, the candidate videos of the target to be analyzed are retrieved based on the keywords representing the target to be analyzed, so that the screened candidate videos meet marketing expectations of users as much as possible, the selected candidate videos are determined to serve as target videos in response to selection instructions of the candidate videos, video fragments to be analyzed in the target videos are determined to obtain the marketing fragments to be analyzed with stronger pertinence, the video fragments to be analyzed are analyzed to obtain marketing summaries of the target to be analyzed, and therefore the automation degree of marketing summary generation can be improved on the premise that the pertinence of the marketing summaries is ensured as much as possible, and the marketing summary generation efficiency is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of a video analysis method of the present application;

FIG. 2 is a schematic illustration of an embodiment of a third page of the video analysis method of the present application;

FIG. 3 is a schematic illustration of an embodiment of a first page of the video analysis method of the present application;

FIG. 4 is a schematic illustration of an embodiment of a second page of the video analysis method of the present application;

FIG. 5 is a schematic diagram of a frame of an embodiment of a video analysis device of the present application;

FIG. 6 is a schematic diagram of a framework of an embodiment of the electronic device of the present application;

FIG. 7 is a schematic diagram of a framework of one embodiment of a computer readable storage medium of the present application.

Detailed Description

The following describes the embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "/" herein generally indicates that the associated object is an "or" relationship. Further, "a plurality" herein means two or more than two.

If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information, and obtains independent consent of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.

Referring to fig. 1, fig. 1 is a flow chart illustrating an embodiment of a video analysis method of the present application.

Specifically, the method may include the steps of:

step S10: searching to obtain candidate videos about marketing analysis targets based on keywords representing targets to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand.

In the embodiment of the disclosure, the keywords for representing the target to be analyzed can be obtained directly based on text data entered by a user, can be obtained through conversion based on voice data input by the user, or can be obtained through extraction based on picture data input by the user. It should be noted that the foregoing examples are only a few possible examples of obtaining keywords, and the specific content of how to obtain keywords is not limited in this application.

In one implementation scenario, the target to be analyzed includes a target product, i.e. the keywords representing the target to be analyzed include category names of the target product, for example, the keywords are "snack", "home appliance", "dress", etc., and in some specific implementation scenarios, in order to improve the accuracy of the search, the user may be prompted to input more detailed product categories, for example, keywords such as "nut", "refrigerator", "dress", etc.

In another implementation scenario, the object to be analyzed includes a target brand, i.e., the keyword characterizing the object to be analyzed includes the name of the target brand, such as "a household brand", "a snack brand", "a clothing brand", etc.

In yet another implementation scenario, the object to be analyzed includes a target product and a target brand, i.e., the keyword characterizing the object to be analyzed includes both the target product and the target brand, such as "a nut gift bag of a certain brand", "a double door refrigerator of a certain brand", "a mountain climbing shoe of a certain brand", and so on.

In a specific implementation scenario, the keyword used to characterize the target to be analyzed may be the name of the video publisher that is or has published the marketing video for some product, for example, the keyword is "a certain anchor", and the live video being marketed and the historical published video containing the marketing content may be obtained as candidate videos based on "a certain anchor" as the keyword.

In one particular implementation scenario, the keywords used to characterize the target to be analyzed may be related descriptions of the marketing content about the marketing video, e.g., the keywords are textual descriptions of "great promotion in home years", "snack gift purchase one by one", etc. used to characterize the marketing content.

In one implementation scenario, all videos in the video platform authorized to be retrieved are traversed based on keywords, and videos authorized to be played about the marketing target to be analyzed are obtained as candidate videos. It should be noted that, the number of candidate videos retrieved in the present application is not limited, and the video type of the candidate videos is any one of real-time videos and non-real-time videos. Specifically, the candidate video is live broadcast by an electronic commerce, marketing video and the like.

In one specific implementation scenario, videos may be retrieved based on a classification tree for keywords, e.g., to obtain a keyword of "a nut big gift package of a certain brand" for a target to be analyzed, videos that market non-snack categories may be first removed, then videos that do not market "a certain brand" may be removed, and finally videos that market "nut big gift packages" in the remaining videos may be retrieved as candidate videos. By the method, the searching efficiency is improved on the premise of improving the accuracy of candidate video searching.

In one implementation scenario, a third page is provided for displaying candidate videos, after candidate videos about marketing targets to be analyzed are retrieved based on keywords characterizing the targets to be analyzed, recommendation degrees of the candidate videos are obtained, and the candidate videos are sequentially displayed in the third page based on the recommendation degrees of the candidate videos. By the method, the candidate videos are sequentially displayed based on the recommendation degree of the candidate videos, and convenience in interaction with users is improved.

In a specific implementation scene, the degree of correlation between the candidate video and the keywords is obtained, the propagation heat of the candidate video is obtained, and the recommendation degree of the candidate video is obtained based on the corresponding degree of correlation and propagation heat of the candidate video. Through the method, the recommendation degree of each candidate video is determined together based on the correlation degree between the candidate video and the keywords and the propagation heat degree of the candidate video, and the sequencing accuracy of each candidate video is improved to meet the selection requirement of a user.

In a specific implementation scenario, the degree of correlation between the candidate video and the keyword may be determined based on the degree of similarity between the keyword and the candidate video, for example, the keyword representing the object to be analyzed is "nut gift", video a, video B and video C are retrieved, the marketing product contained in video a is "cashew nut", the degree of similarity between video a and the keyword is 0.3, the marketing product contained in video B is "daily nut", the degree of similarity between video B and the keyword is 0.7, the marketing product contained in video C is "a certain brand nut gift", the degree of similarity between video C and the keyword is 0.9, and the marketing products are sequentially displayed in the third page in the order of video C, video B and video a without taking other influencing factors into consideration.

In a specific implementation scenario, a number threshold is set, the number of candidate videos displayed on the third page is not greater than the number threshold, for example, the number threshold is set to 20, 30 candidate videos having an association relationship with the keywords are retrieved, the 30 videos are ranked according to the degree of correlation, and the candidate videos with the top 20 ranks are displayed on the third page.

In a specific implementation scenario, the user may perform a deletion operation or a non-interested operation on the candidate videos in the third page, and respond to the trigger of the corresponding control to remove the corresponding candidate videos in the third page. When the number of candidates for display of the third page is set with the number threshold, candidate videos having relevance may be additionally added to the third page for display in response to the candidate videos having display being removed.

In a specific implementation scenario, when the candidate video is a real-time video, the video popularity of the candidate video is related to parameters such as the number of online viewers, the number of endorsements, the product sales amount, etc. of the candidate video, and when the candidate video is a non-real-time video, the video popularity of the candidate video is related to parameters such as the historical number of views, the number of endorsements, the collection amount, the forwarding amount, the product sales amount, etc. of the candidate video.

In one particular implementation scenario, the order in which candidate videos are presented is related to the marketing price of the object to be analyzed in each candidate video.

In one specific implementation scenario, corresponding weights are set for the degree of correlation between the candidate video and the keywords and the propagation heat of the candidate video. For example, when the range of products to which the keywords relate is large, the correspondence weight of the degree of correlation between the candidate video and the keywords is set to be large, and the correspondence weight of the degree of correlation between the candidate video and the keywords is set to be small, specifically, the correspondence weight of the degree of correlation between the candidate video and the keywords is set to 0.65, and the correspondence weight of the degree of correlation between the candidate video and the keywords is set to 0.35, and further, for example, when the products to which the keywords relate are more accurate, the correspondence weight of the degree of correlation between the candidate video and the keywords is set to be small, and the correspondence weight of the degree of correlation between the candidate video and the keywords is set to be large, specifically, the correspondence weight of the degree of correlation between the candidate video and the keywords is set to 0.3, and the correspondence weight of the degree of correlation between the candidate video and the degrees of correlation is set to 0.7.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating an embodiment of a third page of the video analysis method of the present application. In a specific implementation scenario, the third page is used for displaying preview pictures of each candidate video, and a playing control is arranged at a corresponding position of each candidate video, and video contents of the candidate video are played in response to confirmation trigger of the user on the playing control. It should be noted that the configuration shown in fig. 2 is only one possible interface configuration, and is not limited to other possible configurations, such as landscape display, portrait display, etc.

In one particular implementation, a pop-up window may be displayed on a third page based on the video format of the candidate video, and video content of the candidate video may be displayed within the pop-up window in accordance with the video format of the candidate video.

Step S20: and responding to the selection instruction of the candidate video, determining the selected candidate video as a target video, and determining a video fragment to be analyzed in the target video.

In one implementation scenario, a selection control exists for each candidate video, and responsive to a user confirmation of selection of the selection control, the corresponding candidate video is determined as the target video.

In one particular implementation scenario, each target video or a preview of each target video is displayed on a fourth page in response to at least one candidate video being selected to be determined as the target video. Specifically, the corresponding target videos may be sequentially displayed in the fourth page according to the time sequence of the candidate videos determined by the user selection as the target videos.

In one implementation scenario, when the selected candidate video is a non-real-time video, that is, when the target video is a non-real-time video, the target video may be dragged to a progress bar, an analysis start node and an analysis end node selected by a user for the target video are obtained, and a video clip to be analyzed in the target video is determined based on the analysis start node and the analysis end node. By the method, the user can select the more interested marketing segments as the video segments to be analyzed based on the target video, and the intellectualization of marketing analysis is realized based on the marketing video on the premise of referring to the marketing requirements of the user.

In another implementation scenario, when the selected candidate video is a real-time video, that is, when the target video is a real-time video, configuration parameters of the target video are obtained, wherein the configuration parameters include at least one of analysis starting time and analysis duration, and a video clip to be analyzed in the target video is determined based on the configuration parameters. By the method, for the real-time video, the video clips to be analyzed can be selected according to the configuration parameters of the user, and on the premise of referring to the marketing requirements of the user, the intellectualization of marketing analysis is realized based on the marketing video.

In a specific implementation scenario, when the configuration parameter includes an analysis duration, a moment of triggering and determining a control corresponding to the analysis duration is used as an analysis starting moment, and a video corresponding to the analysis duration is acquired according to the analysis duration. For example, the target video is a live broadcast room, the analysis duration is 15 minutes, and the user triggers the determination that the moment of the analysis duration control is "2023, 10, 20, 14, 16 minutes and 00 seconds", that is, the video segment to be analyzed is the video segment from "2023, 10, 20, 14, 16 minutes and 00 seconds" to "2023, 10, 20, 14, 31 minutes and 00 seconds" in the live broadcast room.

In a specific implementation scenario, when the target video is a live broadcast room, the analysis duration is 15 minutes, and the time of the user trigger determines that the analysis duration control is "2023, 10, 20, 14, 16 minutes and 00 seconds", but the live broadcast is finished in the live broadcast room before "2023, 10, 20, 14, 31 minutes and 00 seconds", and the video clip to be analyzed is from 2023, 10, 20, 14, 16 minutes and 00 seconds "to the closing time of the live broadcast room. In this case, a pop-up window may be sent to alert the user that the live room has been closed.

In a specific implementation scenario, when the configuration parameter is the analysis start time, a video segment between the analysis start time and the time when the real-time video is finished playing is taken as the video segment to be analyzed. For example, the target video is a living broadcasting room, and the analysis start time configured by the user is '2023, 10, 20, 14, 16 minutes and 00 seconds', namely, the video segment to be analyzed is from '2023, 10, 20, 14, 16 minutes and 00 seconds' to the closing time of the living broadcasting room. It should be noted that, due to the specificity of the real-time video, the analysis start time configured by the user should be later than the current time, for example, the current time is "2023, 10, 20, 14, 16 minutes and 00 seconds", the analysis start time configured by the user should be later than "2023, 10, 20, 14, 16 minutes and 00 seconds", and when the analysis start time configured by the user is earlier than "2023, 10, 20, 14, 16 minutes and 00 seconds", the popup window is sent to remind the user to report errors and reconfigure.

In a specific implementation scenario, a duration threshold of the video clip to be analyzed may be set, and when the analysis time of the video clip to be analyzed is longer than the duration threshold, the acquisition of the video clip to be analyzed is ended in advance. For example, if the target video is a live broadcast room, the analysis start time configured by the user is "2023, 10, 20, 14, 16 minutes and 00 seconds", the duration threshold is one hour, and the live broadcast room is not closed at "2023, 10, 20, 15, 16 minutes and 00 seconds", then the video segments from "2023, 10, 20, 14, 16 minutes and 00 seconds" to "2023, 10, 20, 15, 16 minutes and 00 seconds" are taken as the video segments to be analyzed. It should be noted that, the duration threshold may be set in advance according to the history information.

In yet another specific implementation scenario, the configuration parameters include an analysis start time and an analysis duration, the prompt message is output in response to an analysis instruction for the target video, and the prompt message is used to prompt configuration of the analysis duration, select the input duration as the analysis duration, and select a current time of the real-time video as the analysis start time based on a confirmation instruction for the input duration.

In one specific implementation scenario, the prompt for entering the configuration parameters is in the form of a pop-up window.

Step S30: and analyzing based on the video segments to obtain marketing summary of the target to be analyzed.

In one implementation scenario, text recognition is performed on video clips to obtain corresponding transcribed text, and analysis is performed based on the transcribed text to obtain marketing summary. According to the method, the marketing fragments to be analyzed with stronger pertinence are obtained, the video fragments to be analyzed are analyzed, and the marketing summary about the targets to be analyzed is obtained, so that the intellectualization of marketing analysis can be realized based on the marketing video on the premise of referring to the marketing demands of users.

In a specific implementation scenario, the video clip includes corresponding subtitles related to the voice information, and the voice data and the subtitle data in the video clip can be combined as auxiliary information of the transcribed text, so as to generate more accurate transcribed text.

In one particular implementation, the voice information may be categorized according to the voice tone in the video clip. For example, the video clips are live broadcast with goods for multiple persons, voice data are classified according to the tone of each anchor, and a transfer text is generated based on the classified voice data, so that the logic of the subsequent marketing summary generation is improved.

It should be noted that the foregoing examples are only several possible examples of obtaining the transcribed text, and the specific content of how to obtain the transcribed text is not limited in this application.

In one implementation scenario, the marketing summary includes a speech framework, the speech framework is disassembled and reassembled based on the transcribed text to obtain a plurality of sub-texts respectively corresponding to different marketing stages, the speech extraction is respectively performed based on each sub-text to obtain marketing speech of different marketing stages, and the speech framework is generated based on the marketing speech of different marketing stages in a combined manner. Through the method, the conversation framework is obtained based on the transcribed text, and can be used for realizing the intellectualization of marketing analysis.

In one specific implementation scenario, the combination of disassembly and reassembly of the transcribed text, speech extraction of the sub-text, and speech framework may be obtained based on a pre-trained network model, inputting the transcribed text into the network model, and taking the output of the network model as the speech framework. Specifically, the network model may be a large language model, a network model of an encoder-decoder architecture, or the like.

It should be noted that the foregoing examples are only several possible examples of obtaining a speech frame, and the specific content of how to obtain the speech frame is not limited in this application.

In a specific implementation scenario, the marketing phase includes a start, an explanation, a promotion, an end, etc., specifically, the marketing session at the start of the marketing phase is a description of a product name, etc., the marketing session at the explanation of the marketing phase is a description of a product efficacy, a price, etc., the marketing session at the promotion of the marketing phase is a promotion, a stock description, etc., and the marketing session at the end of the marketing phase is a next sales time of the product, etc.

In one implementation scenario, the marketing summary includes a lecture logic that analyzes, based on the transcription text, a number of marketing emphasis and a logical relationship between the marketing emphasis, and combines the marketing emphasis based on the logical relationship between the marketing emphasis to obtain the lecture logic. Through the method, the lecture logic is obtained based on the transcribed text, and can be used for realizing the intellectualization of marketing analysis.

In one specific implementation scenario, several marketing emphasis and a combination of logical relationships and lecture logic between marketing emphasis may be derived based on a large language model, with the composed text input into the large language model, with the output of the large language model as the lecture logic. Specifically, a Large Language Model (LLM) is a deep learning model trained using a large amount of text data, and can generate natural language text or understand meaning of language text, and an artificial intelligence model fused with various language knowledge and language rules. The large language model can process various natural language tasks and has the capability of understanding, generating and processing natural language.

The large language model used in the present embodiment is not limited in this application, and examples thereof include T5, GPT-3, and star fire large model.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating an embodiment of a first page of the video analysis method of the present application. In one implementation scenario, after analysis is performed based on video clips to obtain a marketing summary of an object to be analyzed, a first page is displayed in response to a reference instruction to any object to be analyzed, and the first page is provided with a first area for displaying the marketing summary and a second area for displaying the video clips. Specifically, based on review of the video clip obtained with the keyword "nut gift bag", a marketing summary including a speaking frame and lecture logic is generated, for example, the following is displayed in the first sub-area as a speaking frame "about the video clip"

Open field operation: welcome a person to get to my live room-! Today I bring a particularly pleasing nut gift to everybody, absolutely you eat good nuts all over the world-! Look at the bar quickly-!

Remaining person's speech: roll-call vermicelli, please like and join vermicelli group, interact with the coming bar-! The points may then be followed by a blessing pocket and coupon-!

The product explains: the nut gift boxes are carefully selected, with each package of 100 grams of high-end imported nuts. There are sun-rich Vietnam big cashew, delicious np-grade pistachio in Barax California, fresh crisp North American imported pistachio, original turkey hazelnuts, pecan fruits in south Africa golden producing area, etc. Each pack is a genuine pack, which is an absolute good choice for providing nutrition to the elderly, children, pregnant women at home-! Apheresis: the nut gift box has more than 100 grams per bag, and the unit price is reduced by only 10 money. Now special price 139 can take 12 packets of pure nuts, and additional welfare bags and coupons-! Overtaking to get down the bar-! The activity has only one wave, inventory limited! End session: thank you for the viewing and support of-! The nut gift box is very practical, and the self-service of the gift box is very suitable for-! There are more wonderful live broadcasts etc. to see the next time! ".

In the second sub-region the following is displayed as lecture logic "about the video clip"

This is a commercial promotional word of the e-commerce host, which is mainly used for promoting a nut product. The anchor attracts the audience to purchase by telling the price, quality, mouthfeel, and proper population of the product.

The main broadcasting firstly emphasizes the characteristics and the price of the product, which means that the imported nut gift box is a high-quality imported nut gift box, each pack is 100 g of large-package pure nuts, and the price is about 10 pieces of money per pack, so that the imported nut gift box is very practical.

The anchor then explains why he is eating nuts, and he mentions that various people, such as the elderly, children, pregnant women, etc., need to supplement the nutrition of the nuts. He also mentions some scenes, such as looking at parents, looking at patients, etc., indicating that the nut feeding is an etiquette in these cases.

The anchor attracts the interest of the audience by emphasizing the kind and taste of the product, and he mentions nuts with different tastes such as chocolate taste, milk flavor, caramel honey paste and the like, so that the audience can feel various delicacies.

Finally, the method of using time-limited offers by the anchor gives an active link and indicates that only the first few people can enjoy the offers to purchase the product. He also enhanced the desire to purchase, indicating that this is a real welfare activity, offering the viewer a great discount).

It should be noted that the configuration shown in fig. 3 is only one possible interface configuration, and is not limited to other possible configurations.

In a specific implementation scenario, recording a video segment to be analyzed determined in a real-time video, and displaying the recorded video together on a first page to a user. It should be noted that, when recording real-time video, the user needs to obtain authorization of the video platform, and when displaying the recorded video to the user, the user needs to obtain authorization of the video publisher.

In another specific implementation scenario, a video segment to be analyzed determined in a non-real-time video is intercepted, and the intercepted video is displayed to a user. It should be noted that, when recording the non-real-time video, the authorization of the video creator needs to be obtained, and when displaying the intercepted video to the user, the authorization of the video creator needs to be obtained as well.

In a specific implementation scenario, audio emotion, mood and the like in a video clip are acquired based on the video clip to be analyzed, for example, in a live broadcast room with goods, a host uses flatter mood when introducing a product name, the host uses more genuine mood when introducing a product efficacy, the host uses more excited mood when promoting a bill, emotion marking is performed on a transcribed text aiming at different moods, and auxiliary information is provided for subsequent generation of marketing summaries.

In a specific implementation scenario, subtitle matching is performed on the video clips based on the transcribed text, so that convenience of a user in viewing the video clips is improved, and the video content corresponding to the transcribed text in a text form is displayed in the video clips based on the emotion tags marked on the transcribed text. By the method, based on the captions and the emotion labels displayed by the text, the user is assisted in understanding marketing in the video clips, and the accuracy of the user in understanding marketing summary is improved.

In one particular implementation, the second region of the first page displays a thumbnail of the video clip that has been analyzed, and the video clip that has been analyzed is played in response to a user triggering a play control of the first region.

In one particular implementation scenario, the analyzed target video is displayed separately from the target video being analyzed, e.g., a fourth page displays the target video being analyzed and a fifth page displays the target video that has been analyzed. And responding to the reference instruction of any video clip which is already analyzed in the fifth page, and jumping to the first page.

In one implementation scenario, a selection control is arranged in a second area of the first page and used for circling at least part of contents in the marketing summary, a preset control is displayed for the circled contents in the first page, the circled contents are referred to generate the marketing summary of the target to be marketing in response to a trigger instruction of the preset control, and the marketing summary of the target to be marketing is displayed in the second page. By the method, on the premise of referring to the marketing demands of users, the intellectualization of marketing analysis is realized based on the marketing video, the marketing summary of the target to be analyzed is automatically generated based on the marketing summary of the target to be analyzed, and the convenience of marketing summary generation is improved.

In a specific implementation scenario, the first page is further provided with a export control, content displayed on the first page is exported in a document form in response to confirmation trigger of the export control by a user, marketing contained in the document is summarized as structured data, so that the user can conveniently view the marketing summary, and specific structural information of the marketing summary can be determined based on a display structure of the first page, which is not limited in the application. Video clips may be added to the document based on the form of the video links.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating an embodiment of a second page of the video analysis method of the present application. In a specific implementation scenario, in response to a trigger instruction to a preset control, "product" in fig. 4: nut gift box opening session: welcome a person to get to my live room-! I have now brought a particularly pleasing nut gift … "to the person, the" product "in fig. 4: the intelligent massager faces to users: people with long working and high learning pressure, especially office workers and students … … bring a very good life small assistant-intelligent massage instrument for the people today-! The massage device can provide comfortable massage experience for you at any time and any place, and enables you to be busy as a marketing summary of targets to be marketed. It should be noted that the configuration shown in fig. 4 is only one possible interface configuration, and is not limited to other possible configurations.

In one particular implementation scenario, the targets to be marketed may be described in addition to or as input to a marketing requirement in a dialog, and an adjusted marketing summary based on the targets to be marketed is generated based on the marketing requirement.

In one specific implementation scenario, a plurality of marketing summaries are generated for different marketing scenarios, respectively, with respect to the targets to be marketing.

It should be noted that, the method for generating the marketing summary of the target to be marketed is not limited in this application, for example, a large language model.

In one implementation scenario, after a marketing summary of a target to be marketed is generated, the marketing summary is input into a large language model in response to a user selecting the marketing summary in a certain scenario among a plurality of generated marketing summaries, and a marketing text after semantic expansion is obtained. By the method, the multi-scene marketing summary is automatically generated based on the scheme, the user responds to selection of the marketing summary in a certain scene, a richer marketing text is generated, the manpower consumption is reduced, and the efficiency of acquiring the marketing summary is improved.

In a specific implementation scenario, referring to emotion labels marked in the transcribed text corresponding to the selected content, providing voice reference information such as audio emotion, audio mood, audio tone and the like, and generating synthetic audio related to the marketing text with a certain language emotion by utilizing a voice synthesis technology based on the voice reference information and the marketing text, wherein the voice synthesis can be realized by using a pre-trained Hidden Markov Model (HMM), a Deep Neural Network (DNN) and the like.

In one specific implementation scenario, a configuration control is set, the configuration control displays dubbing attributes of the avatars to which the video dubbing belongs, for example, names each preset avatar, and displays brief introduction of the dubbing attributes, determines a certain avatar in response to user trigger, synthesizes original audio related to marketing text based on audio parameters configured by the avatar, and generates final audio with marketing emotion based on voice reference information on the basis of the original audio.

In a specific implementation scenario, when analyzing the video segment, expression features, morphological features and the like of a user in the video segment can be obtained, for example, expression, action and the like of a live broadcast with goods are obtained, label marking is performed based on the expression features and the morphological features, after final audio with marketing emotion is generated based on marketing text, virtual image driving is performed on the final audio, dynamic display of the virtual image is adjusted based on labels of the expression features, the morphological features and the like, marketing video with marketing audio is synthesized, complexity of artificial learning of marketing operation is reduced, and convenience of generating the marketing video is improved.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating an embodiment of a video analysis device 50 according to the present application. The video analysis device 50 includes: the searching module 51, the determining module 52 and the analyzing module 53, wherein the searching module 51 is used for searching and obtaining candidate videos about marketing analysis targets based on keywords representing the targets to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand; the determining module 52 is configured to determine, in response to a selection instruction for the candidate video, the selected candidate video as a target video, and determine a video clip to be analyzed in the target video; the analysis module 53 is configured to perform analysis based on the video clip, so as to obtain a marketing summary of the target to be analyzed.

According to the scheme, the video analysis device 50 acquires the keyword of at least one of the target product and the target brand to be analyzed, and searches for the candidate video of the target to be analyzed based on the keyword representing the target to be analyzed, so that the screened candidate video meets the marketing expectations of users as far as possible, the selected candidate video is determined as the target video in response to the selection instruction of the candidate video, the video fragment to be analyzed in the target video is determined, so as to obtain the marketing fragment to be analyzed with stronger pertinence, and the video fragment to be analyzed is analyzed to obtain the marketing summary of the target to be analyzed, so that the automation degree of marketing summary generation can be improved on the premise of ensuring the pertinence of the marketing summary as far as possible, and the marketing summary generation efficiency can be improved.

In some disclosed embodiments, the target video comprises real-time video, and the determination module 52 further comprises a configuration module (not shown) for obtaining configuration parameters for the target video; wherein the configuration parameters include at least one of an analysis start time and an analysis duration; and determining the video clips to be analyzed in the target video based on the configuration parameters.

In some disclosed embodiments, the configuration parameters include an analysis start time and an analysis duration, and the configuration module further includes a configuration determination sub-module (not shown) for outputting prompt information in response to an analysis instruction for the target video; the prompt information is used for prompting configuration analysis duration; based on a confirmation instruction of the input duration, the input duration is selected as the analysis duration, and the current moment of the real-time video is selected as the analysis starting moment.

In some disclosed embodiments, the analysis module 53 further includes an identification analysis module (not shown) for identifying based on the video clip, resulting in a transcribed text; and analyzing based on the transfer text to obtain marketing summary.

In some disclosed embodiments, the marketing summary includes a speech framework, and the recognition analysis module further includes a speech framework generation module (not shown) for performing disassembly and reassembly based on the transcribed text to obtain a plurality of sub-texts respectively corresponding to different marketing phases; respectively extracting the speech based on each sub-text to obtain marketing speech of different marketing stages; a speech framework is generated based on the marketing speech of different marketing stages.

In some disclosed embodiments, the marketing summary includes lecture logic, and the recognition analysis module further includes a speech lecture logic generation module (not shown) for analyzing, based on the transcribed text, a number of marketing emphasis and a logical relationship between the marketing emphasis; and combining the marketing key points based on the logic relation among the marketing key points to obtain the explanation logic.

In some disclosed embodiments, the video analysis device 50 further includes a first page display module (not shown) for displaying a first page in response to a reference instruction to any target to be analyzed after the marketing summary of the target to be analyzed is obtained based on the video clip; the first page is provided with a first area for displaying marketing summaries and a second area for displaying video clips.

In some disclosed embodiments, the video analytics device 50 further includes a second page display module (not shown) for displaying preset controls for the circled content in the first page if at least a portion of the content is circled in the marketing summary; and responding to a trigger instruction of the preset control, generating a marketing summary of the target to be marketed with reference to the circled content, and displaying the marketing summary of the target to be marketed on a second page.

In some disclosed embodiments, the retrieval module 51 further includes a sequential display module (not shown) for obtaining a recommendation level for each candidate video; and sequentially displaying the candidate videos in a third page based on the recommendation degree of the candidate videos.

In some disclosed embodiments, the sequential display module further includes a recommendation degree acquisition module (not shown) for acquiring a degree of correlation between the candidate video and the keyword, and acquiring a propagation heat degree of the candidate video; and obtaining the recommendation degree of the candidate video based on the corresponding correlation degree and the propagation heat degree of the candidate video.

Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an embodiment of an electronic device 60 of the present application. The electronic device 60 comprises a memory 61 and a processor 62, the memory 61 having stored therein program instructions, the processor 62 being adapted to execute the program instructions to implement the steps of any of the video analysis method embodiments described above. Reference may be made specifically to the foregoing disclosed embodiments, and details are not repeated here. The electronic device 60 may specifically include, but is not limited to: servers, smartphones, notebook computers, tablet computers, kiosks, etc., are not limited herein.

In particular, the processor 62 is used to control itself and the memory 61 to implement the steps in any of the video analysis method embodiments described above. The processor 62 may also be referred to as a CPU (Central Processing Unit ). The processor 62 may be an integrated circuit chip having signal processing capabilities. The processor 62 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be commonly implemented by an integrated circuit chip.

According to the scheme, the electronic device 60 acquires the keyword of at least one of the target product and the target brand to be analyzed, and searches for the candidate video of the target to be analyzed based on the keyword representing the target to be analyzed, so that the screened candidate video meets the marketing expectations of the user as much as possible, the selected candidate video is determined as the target video in response to the selection instruction of the candidate video, the video fragment to be analyzed in the target video is determined, so that the targeted marketing fragment with stronger pertinence is obtained, the video fragment to be analyzed is analyzed, and the marketing summary of the target to be analyzed is obtained, so that the automation degree of generating the marketing summary can be improved on the premise of ensuring the pertinence of the marketing summary as much as possible, and the generation efficiency of the marketing summary can be improved.

Referring to FIG. 7, FIG. 7 is a schematic diagram illustrating an embodiment of a computer-readable storage medium 70 of the present application. The computer readable storage medium 70 stores program instructions 71 executable by a processor, the program instructions 71 for implementing the steps in any of the video analysis method embodiments described above.

According to the scheme, the computer-readable storage medium 70 acquires the keyword of at least one of the target product and the target brand to be analyzed, and searches for the candidate video of the target to be analyzed based on the keyword representing the target to be analyzed, so that the candidate video obtained through screening meets the marketing expectations of the user as much as possible, the selected candidate video is determined as the target video in response to the selection instruction of the candidate video, the video fragment to be analyzed in the target video is determined to obtain the marketing fragment to be analyzed with stronger pertinence, and the video fragment to be analyzed is analyzed to obtain the marketing summary about the target to be analyzed, so that the automation degree of marketing summary generation can be improved on the premise of ensuring the pertinence of the marketing summary as much as possible, and the marketing summary generation efficiency can be improved.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A method of video analysis, comprising:

searching to obtain candidate videos about marketing the target to be analyzed based on keywords representing the target to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand;

responding to a selection instruction of the candidate video, determining the selected candidate video as a target video, and determining a video fragment to be analyzed in the target video;

and analyzing based on the video segments to obtain marketing summary of the target to be analyzed.

2. The method of claim 1, wherein the target video comprises real-time video, and wherein the determining the video segments to be analyzed in the target video comprises:

acquiring configuration parameters of the target video; wherein the configuration parameters comprise at least one of analysis starting time and analysis duration;

and determining the video segment to be analyzed in the target video based on the configuration parameters.

3. The method of claim 2, wherein the configuration parameters include the analysis start time and the analysis duration, and the obtaining configuration parameters for the target video includes:

Responding to an analysis instruction of the target video, and outputting prompt information; the prompt information is used for prompting configuration analysis duration;

and selecting the input duration as the analysis duration based on a confirmation instruction of the input duration, and selecting the current moment of the real-time video as the analysis starting moment.

4. The method of claim 1, wherein the analyzing based on the video clip to obtain a marketing summary of the target to be analyzed comprises:

identifying based on the video clip to obtain a transfer text;

and analyzing based on the transfer text to obtain the marketing summary.

5. The method of claim 4, wherein the marketing summary comprises a speech framework, the analyzing based on the transcription text resulting in the marketing summary comprises:

disassembling and reorganizing based on the transfer text to obtain a plurality of sub-texts respectively corresponding to different marketing stages;

respectively extracting the voice based on each sub-text to obtain marketing voice at different marketing stages;

the speech frames are generated in combination based on marketing speech of different ones of the marketing phases.

6. The method of claim 4, wherein the marketing summary comprises lecture logic, wherein the analyzing based on the transcription text results in the marketing summary comprising:

based on the transfer text, analyzing and obtaining a plurality of marketing key points and logic relations among the marketing key points;

and combining the marketing key points based on the logic relation among the marketing key points to obtain the lecture logic.

7. The method of claim 1, wherein after the analyzing based on the video clip, the method further comprises:

responding to a reference instruction of any target to be analyzed, and displaying a first page;

the first page is provided with a first area for displaying the marketing summary and a second area for displaying the video clip.

8. The method of claim 7, wherein in the event that at least a portion of the content in the marketing summary is circled, the method further comprises:

displaying a preset control for the circled content in the first page;

and responding to a trigger instruction of the preset control, generating a marketing summary of the target to be marketed by referring to the selected content, and displaying the marketing summary of the target to be marketed on a second page.

9. The method according to claim 1, wherein after retrieving candidate videos for marketing the object to be analyzed based on the keyword characterizing the object to be analyzed, and before the determining the selected candidate video as the target video, the method further comprises:

acquiring recommendation degrees of the candidate videos;

and sequentially displaying the candidate videos in a third page based on the recommendation degree of the candidate videos.

10. The method of claim 9, wherein the obtaining the recommendation level for each of the candidate videos comprises:

acquiring the correlation degree between the candidate video and the keywords, and acquiring the propagation heat of the candidate video;

and obtaining the recommendation degree of the candidate video based on the corresponding correlation degree and the propagation heat degree of the candidate video.

11. A video analysis device, comprising:

the retrieval module is used for retrieving candidate videos about marketing the analysis target based on the keywords representing the target to be analyzed; wherein the target to be analyzed comprises at least one of a target product and a target brand;

the determining module is used for responding to the selection instruction of the candidate video, determining the selected candidate video as a target video and determining a video fragment to be analyzed in the target video;

And the analysis module is used for analyzing based on the video clips to obtain marketing summary of the target to be analyzed.

12. An electronic device comprising a memory and a processor coupled to each other, the memory having program instructions stored therein, the processor being configured to execute the program instructions to implement the video analysis method of any one of claims 1 to 10.

13. A computer readable storage medium, characterized in that program instructions executable by a processor for implementing the video analysis method of any one of claims 1 to 10 are stored.