CN115037987B

CN115037987B - Live video review method and system

Info

Publication number: CN115037987B
Application number: CN202210637931.7A
Authority: CN
Inventors: 邢东进; 杨洪进; 陈毅松
Original assignee: Xiamen Chanyu Network Technology Co ltd
Current assignee: Xiamen Chanyu Network Technology Co ltd
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2024-05-07
Anticipated expiration: 2042-06-07
Also published as: CN115037987A

Abstract

The invention belongs to the technical field of live broadcast and cargo carrying, and discloses a review method of live broadcast and cargo carrying video, which comprises the following steps: inputting the review requirement of people on live video with goods; analyzing and processing the review requirement to obtain a host and commodities in live video clips which are wanted to be watched by people and contained in the review requirement; based on the anchor and the commodity in the live video with goods which people want to watch, the live video with goods which meets the review requirement is automatically generated.

Description

Live video review method and system

Technical Field

The invention belongs to the technical field of live broadcast and live delivery, and particularly relates to a live broadcast and live delivery video review method and system.

Background

Live video is a novel sales form which enables a person to introduce the appearance, efficacy and the like of goods in a live broadcasting room to attract people to purchase the goods in the live broadcasting room, in real life, there is a demand for watching live video with goods back after the live video with goods is finished, for example, a live video with goods back is rewound through watching live video with goods back, and people want to watch live video with goods back because the people miss the opportunity of watching live video with goods in real time, however, in the live video live broadcasting method, people select historical live video with goods back on a live platform, video segments which want to watch by themselves can only be watched back and forth in a mode of dragging a video progress bar, and the live platform cannot automatically generate video segments which meet the watching demand of people, so that the live video with goods back is troublesome for people.

Disclosure of Invention

Aiming at the technical problems, the invention analyzes the review requirement of people on live video with goods to obtain the anchor and the goods in the live video with goods which the people want to watch, and automatically generates the corresponding live video with goods which meet the review requirement based on the anchor and the goods in the live video with goods which the people want to watch, aiming at solving the problem of troublesome operation when people review live video with goods in the prior art.

In order to achieve the above purpose, a review method of live video with goods is provided, which specifically comprises the following steps:

inputting the review requirement of people on live video with goods;

analyzing and processing the review requirement to obtain a host and commodities in live video clips which are wanted to be watched by people and contained in the review requirement;

based on the anchor and the commodity in the live video clips which people want to watch, automatically generating corresponding live video clips meeting the review requirement;

the process of automatically generating the corresponding live video clips with the review requirements comprises the following steps:

judging whether face recognition is needed to detect live video clips meeting the review requirement of people from live video clips, if so, continuing the next step, otherwise, jumping to the fifth step;

Extracting an image in the live video with goods, determining a face image in the image, generating face feature quantity of the face image at the same time, calculating similarity between the face feature quantity and a pre-stored standard face feature quantity, and continuing to jump to the fifth step;

judging whether a live broadcast video fragment meeting the review requirement of people is detected from live broadcast video with voice recognition, if so, continuing the next step, otherwise, jumping to the fifth step;

Extracting sound data in live broadcast video with goods, generating sound characteristic quantity of the sound data, and carrying out similarity comparison on the sound characteristic quantity and a pre-stored standard sound characteristic quantity to obtain similarity of the sound characteristic quantity and the standard sound characteristic quantity, and continuing to jump to the fifth step;

In the case of using only face recognition, a live-broadcast live-belt video clip in which the similarity of the face feature quantity and the standard face feature quantity is greater than a first threshold value is extracted from the live-broadcast live-belt video, in the case of using only voice recognition, a live-broadcast live-belt video clip in which the similarity of the voice feature quantity and the standard voice feature quantity is greater than a second threshold value is extracted from the live-broadcast live-belt video, in the case of using both face recognition and voice recognition, the similarity of the face feature quantity and the standard face feature quantity is greater than the first threshold value, and the similarity of the voice feature quantity and the standard voice feature quantity is greater than the second threshold value is extracted from the live-broadcast live-belt video clip.

Compared with the prior art, the invention has the following beneficial effects:

1. According to the review method of the live video with goods, firstly, the review requirement of people on the live video with goods is input, secondly, analysis processing is conducted on the review requirement, so that a host in a live video section which the people want to watch and goods contained in the review requirement are obtained, and finally, based on the host in the live video section which the people want to watch and the goods, a corresponding live video section which meets the review requirement is automatically generated;

2. The method solves the problem of troublesome operation caused by the fact that when people review live video with goods in the prior art, video clips which the people want to watch can only be selected in a mode of dragging a progress bar, and can automatically generate live video clips with goods meeting the review requirements of the people.

Drawings

FIG. 1 is a flow chart of the steps of a review method of live video with cargo according to the present invention;

FIG. 2 is a flowchart illustrating steps for automatically generating live video clips with video clips meeting review requirements in accordance with the present invention;

FIG. 3 is a flowchart of steps performed prior to adjusting the first threshold and the second threshold in accordance with the present invention;

FIG. 4 is a flowchart illustrating steps for adjusting a first threshold and a second threshold according to the present invention;

Fig. 5 is a block diagram of a review system for live video with cargo according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of this disclosure.

Referring to fig. 1, the invention provides a review method of live video with goods, which is mainly implemented by executing the following steps:

Step one, inputting the review requirement of people on live video with goods;

Analyzing and processing the review requirement, so as to obtain a host in a live video clip which is wanted to be watched by people and commodities contained in the review requirement;

And thirdly, automatically generating a corresponding live broadcast with video clip meeting the review requirement based on the anchor and the commodity in the live broadcast with video clip which people want to see.

Specifically, the inventor considers that live video taking is a sales form for leading a live webmaster to introduce the appearance, the efficacy and the like of commodities in a live broadcasting room so as to attract people to purchase the commodities in the live broadcasting room, in actual life, when people review historical live video taking, the live video taking is generally not required to be completely reviewed, but only live video taking fragments appearing by the specific live webmaster are selected to be reviewed, or live video taking fragments selling the commodities in the specific live broadcasting room by the specific live webmaster, therefore, people are enabled to input review requirements for live video taking, and the host in the live video taking fragments desired to be reviewed by people and the commodities are determined from the review requirements, and corresponding live video taking fragments are automatically generated according to the host and the commodities in the live video taking fragments desired to be reviewed by people, so that people can conveniently and rapidly complete the review live video taking fragments and satisfy the review requirements of live video taking people.

Further, referring to fig. 2, the process of automatically generating the corresponding live video clip with the review requirement, which meets the review requirement, further includes the following steps:

Step one, judging whether face recognition is needed to detect live video clips with goods from live video, which meet the review requirement of people, if so, continuing the next step, otherwise, jumping to the step five;

Step two, extracting images in live video with goods, determining face images in the images, generating face feature quantities of the face images at the same time, calculating similarity between the face feature quantities and pre-stored standard face feature quantities, and continuing to jump to step five;

Step three, judging whether a live broadcast with goods video fragment meeting the review requirement of people is detected from live broadcast with goods video by using voice recognition, if so, continuing the next step, otherwise, jumping to the step five;

Step four, extracting sound data in live broadcast video with goods, generating sound characteristic quantity of the sound data, and carrying out similarity comparison on the sound characteristic quantity and a pre-stored standard sound characteristic quantity to obtain similarity of the sound characteristic quantity and the standard sound characteristic quantity, and continuing to jump to step five;

Extracting live video clips with the similarity of the face feature quantity and the standard face feature quantity being larger than a first threshold value from live video with face recognition only, extracting live video clips with the similarity of the sound feature quantity and the standard sound feature quantity being larger than a second threshold value from live video with voice recognition only, and extracting live video clips with the similarity of the face feature quantity and the standard face feature quantity being larger than the first threshold value from live video with face recognition and voice recognition simultaneously;

Furthermore, the method for inputting the review requirement of people on live video with goods specifically comprises the steps of inputting through a voice mode and inputting through a text mode;

further, the face recognition method can be used for detecting the anchor in the live video with goods, and the voice recognition method can be used for detecting the goods in the live video with goods.

Specifically, after the review requirement of people on live video is obtained, the review requirement can be generally divided into three types, the first type needs to use face recognition to generate live video clips meeting the review requirement, for example, the people want to review live video clips with specific people's air anchor, the second type needs to use voice recognition to generate live video clips meeting the review requirement, for example, the people want to review live video clips with specific living room commodity sold, and the third type needs to use face recognition and voice recognition to generate live video clips meeting the review requirement, for example, the people want to review live video clips with specific living room commodity sold by specific people's air anchor, so that the live video clips meeting the review requirement of people are respectively extracted from the live video with live video in different types according to the review requirement of different types in an unused mode;

For the first type of review requirement, extracting images in live video with goods, determining face images in the images, generating face feature values of the face images, calculating the similarity of the face feature values to pre-stored standard face feature values, extracting live video segments with the similarity of the face feature values to the standard face feature values being larger than a first threshold value from the live video with goods, for the second type of review requirement, extracting sound data in the live video with goods, generating sound feature values of the sound data, and simultaneously comparing the similarity of the sound feature values to pre-stored standard sound feature values to obtain the similarity of the sound feature values and the pre-stored standard sound feature values, extracting live video segments with the similarity of the sound feature values to the standard sound feature values being larger than the second threshold value from the live video with goods, and for the third type of review requirement, extracting live video segments with the similarity of the face feature values to the standard face feature values being larger than the first threshold value from the live video with the similarity of the sound feature values to the standard sound feature values being larger than the second threshold value. Wherein, the face recognition is used to detect the anchor in the live video with goods instead of voice recognition because the accuracy of detecting the anchor using face recognition is generally higher, and the voice recognition is used to detect the goods in the live video with goods instead of image recognition because the speed of detecting the goods using voice recognition is generally faster;

In the above method, the live video clips satisfying the review requirement have been generated according to the comparison result of the corresponding similarity with the first threshold and the second threshold, on the basis of which, the inventor considers that when calculating the similarity between the face feature quantity and the standard face feature quantity and comparing the similarity with the first threshold, there may be a problem that the face feature quantity of the host changes with time, because the pre-stored standard face feature quantity is used in the similarity comparison, the initial similarity is larger, the later similarity is smaller, if the pre-set first threshold is always maintained, the failure of face recognition may be caused, and when calculating the similarity between the sound feature quantity and the standard sound feature quantity and comparing the similarity with the second threshold, there is a similar problem, so the inventor proposes a method for adjusting the first threshold and the second threshold in order to solve the problem.

Further, the first threshold and the second threshold can be dynamically adjusted according to the similarity obtained by the corresponding histories;

Further, referring to fig. 3, before dynamically adjusting the first threshold and the second threshold, the method includes the following steps:

Step one, when the time for adjusting the first threshold and the second threshold arrives, acquiring the current time t ₀;

Step two, taking deltat as a fixed time interval, and acquiring the historical similarity between the face feature quantity and the standard face feature quantity and the historical similarity between the sound feature quantity and the standard sound feature quantity when the face image and the sound data which need to be recognized at present are successfully recognized in the time of pushing forwards the current time t ₀ for n time intervals deltat;

step three, aiming at the historical similarity belonging to the same time interval delta t, calculating the average number of the historical similarity to obtain the average historical similarity of n time intervals delta t respectively;

step four, taking the middle time in each time interval delta t as an abscissa, taking the average historical similarity of each time interval delta t as an ordinate, and simultaneously generating a mathematical function for calculating the ordinate by the abscissa;

Step five, bringing the current time t ₀ into the mathematical function, so as to calculate and obtain the prediction similarity under the current time t ₀;

Further, the above-mentioned timing of adjusting the first threshold and the second threshold includes, when the time from the last time of adjusting the first threshold and the second threshold exceeds a predetermined time threshold, the number of times of similarity comparison between the face feature quantity and the standard face feature quantity performed after the last time of adjusting the first threshold and the second threshold, when the number of times of similarity comparison between the sound feature quantity and the standard sound feature quantity exceeds a predetermined number of times threshold, and when the number of times of successfully recognizing the commercial product between the face image of the anchor and the live broadcast reaches a predetermined number of times threshold after the last time of adjusting the first threshold and the second threshold.

Specifically, in the above method, the current time when the threshold is adjusted is first determined, then the historical similarity when the recognition is successful is obtained within a period of time before the current time, then the historical similarity is divided into different groups by using a fixed time interval, the average historical similarity of each group is calculated, finally the middle time of each group is taken as the abscissa, the average historical similarity of each group is taken as the ordinate, a plurality of coordinate points are obtained, a mathematical function for calculating the ordinate by the abscissa can be obtained by using a fitting algorithm, the current time when the threshold is adjusted is input into the mathematical function, and the prediction result for the similarity can be obtained, wherein the prediction result is used for completing the dynamic adjustment of the first threshold and the second threshold in the subsequent steps.

Further, referring to fig. 4, the process of dynamically adjusting the first threshold and the second threshold specifically includes the following steps:

step one, calculating the difference value between a first threshold value and a second threshold value of the current time and the corresponding prediction similarity;

Judging whether the difference value is larger than a preset difference value threshold alpha, if so, indicating that the first threshold value of the current time and the second threshold value of the current time are relatively larger, and carrying out reduction treatment on the first threshold value and the second threshold value, otherwise, continuing the next step;

Step three, continuously judging whether the difference value is smaller than a preset difference value threshold value beta, if so, indicating that the first threshold value of the current time and the second threshold value of the current time are relatively smaller, and carrying out the adjustment processing on the first threshold value and the second threshold value, otherwise, continuously carrying out the next step;

and step four, judging that the first threshold value and the second threshold value of the current time do not need to be adjusted.

Specifically, in the above method, firstly, when the threshold needs to be adjusted, the difference between the first threshold and the second threshold and the obtained predicted similarity is calculated, and secondly, the relationship between the difference and the preset threshold α of the difference is compared, in this embodiment, the specific value of the threshold α is not limited, and may be set according to the actual situation, when the difference is greater than the threshold α, that is, it is explained that the first threshold and the second threshold are already greater in the current situation, appropriate adjustment processing should be performed on the first threshold and the second threshold, and when the difference is less than or equal to the threshold α, the relationship between the difference and the preset threshold β is continuously judged, in this embodiment, the specific value of the threshold β is not limited, and may be set according to the actual situation, and when the difference is less than the threshold β, that is, it is explained that the first threshold and the second threshold are already smaller in the current situation, otherwise, it is explained that the first threshold and the second threshold need not be adjusted, and if the difference is not greater than the threshold is greater than the threshold, the first threshold and the second threshold need to be adjusted, and the success rate of recognition of the first threshold and the second threshold can be successfully recognized according to the recognition of the voice history is achieved, and the success rate of the recognition history of the recognition of the voice can be ensured.

Referring to fig. 5, the present invention further provides a live video review system, which is configured to implement a live video review method as described above, and specifically includes the following modules:

The terminal module is used for inputting the review requirement of people on the live video with goods in a voice form and a text form, analyzing and obtaining a host in the live video with goods which the people want to watch and commodities contained in the review requirement, and playing the live video with goods which meets the review requirement of the people;

A server module for storing in advance standard face feature values for identifying face images of a live broadcast room and standard sound feature values for identifying goods from sound data of the live broadcast room, and for identifying the above-mentioned live broadcast and the above-mentioned goods in live broadcast video clips based on people's desire, performing similarity comparison on the face feature quantity of the face image in the live broadcast live video with the standard face feature quantity, and performing similarity comparison on the sound feature quantity of the sound data in the live broadcast live video with the standard sound feature quantity to automatically generate a corresponding live broadcast live video fragment meeting the review requirement;

The communication module is used for carrying out information transfer between the terminal module and the server module, and comprises the steps of sending the review requirement to the server module from the terminal module and sending the live broadcast video clips which are automatically generated by the server module and meet the review requirement to the terminal module.

In summary, according to the review method of live video with goods, firstly, the review requirement of people on live video with goods is input, secondly, analysis processing is conducted on the review requirement, so that the anchor in live video with goods which people want to watch and goods contained in the review requirement are obtained, and finally, based on the anchor and the goods in live video with goods which people want to watch, corresponding live video with goods which meet the review requirement are automatically generated. The method solves the problem of troublesome operation caused by the fact that when people review live video with goods in the prior art, video clips which the people want to watch can only be selected in a mode of dragging a progress bar, and can automatically generate live video clips with goods meeting the review requirements of the people.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a non-transitory computer readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as the scope of the description of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing examples have been presented to illustrate only a few embodiments of the invention and are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The review method of the live video with goods is characterized by comprising the following steps of:

inputting the review requirement of people on live video with goods;

analyzing and processing the review requirement, so as to obtain a host and commodities in live video clips which are wanted to be watched by people and contained in the review requirement;

The process of automatically generating the corresponding live video clips with the review requirements is as follows:

Judging whether face recognition is needed to detect live video clips meeting the review requirement of people from live video clips, if so, extracting images in the live video clips, determining face images in the images, generating face feature quantities of the face images, and calculating the similarity between the face feature quantities and pre-stored standard face feature quantities;

Judging whether the face recognition is needed or not, judging whether the voice recognition is needed or not to detect live broadcast live video clips meeting the review requirement of people from live broadcast live video, if so, extracting voice data in the live broadcast live video, generating voice feature quantity of the voice data, and carrying out similarity comparison on the voice feature quantity and the pre-stored standard voice feature quantity to obtain the similarity of the voice feature quantity and the standard voice feature quantity;

Extracting live video clips with the similarity of face feature quantity and standard face feature quantity larger than a first threshold value from live video with face recognition only; extracting live video clips from the live video with voice feature quantity having a similarity to the standard voice feature quantity greater than a second threshold value using only voice recognition; under the condition that face recognition and voice recognition are used simultaneously, extracting live video clips with the similarity of face feature quantity and standard face feature quantity larger than a first threshold value and the similarity of voice feature quantity and standard voice feature quantity larger than a second threshold value from live video clips with the live video clips;

The first threshold and the second threshold can be dynamically adjusted according to the similarity obtained by corresponding histories; before the dynamic adjustment for the first threshold value and the second threshold value, the method comprises the following steps:

When the time for adjusting the first threshold value and the second threshold value arrives, acquiring the current time t0; taking deltat as a fixed time interval, acquiring the historical similarity between the face feature quantity and the standard face feature quantity and the historical similarity between the sound feature quantity and the standard sound feature quantity when successfully identifying the face image and the sound data in the time of pushing forward n time intervals deltat from the current time t0;

For the historical similarity belonging to the same time interval delta t, calculating the average number of the historical similarity to obtain the average historical similarity of n time intervals delta t respectively; taking the middle time within each time interval deltat as the abscissa and the average historical similarity of each time interval deltat as the ordinate, and simultaneously generating a mathematical function calculating the ordinate from the abscissa;

bringing the current time t0 into the mathematical function, so as to calculate and obtain the prediction similarity under the current time t 0;

A process for dynamically adjusting the first threshold and the second threshold, comprising the steps of:

Calculating the difference between the first threshold value and the second threshold value of the current time and the corresponding prediction similarity; judging whether the difference value is larger than a preset difference value threshold value alpha, if so, carrying out reduction treatment on the first threshold value and the second threshold value, otherwise, continuing the next step;

Continuously judging whether the difference value is smaller than a preset difference value threshold beta, if so, carrying out the enlargement processing on the first threshold value and the second threshold value, otherwise, judging that the first threshold value and the second threshold value of the current time are not required to be adjusted;

The adjusting the first threshold and the second threshold comprises: when the time from the last adjustment of the first threshold value and the second threshold value exceeds a preset time threshold value, the number of times of similarity comparison between the face feature quantity and the standard face feature quantity and the number of times of similarity comparison between the sound feature quantity and the standard sound feature quantity, which are performed after the last adjustment of the first threshold value and the second threshold value, exceeds a preset number of times threshold value, or when the number of times of successfully recognizing the face image of the anchor and the commodity in the direct broadcasting room after the last adjustment of the first threshold value and the second threshold value reaches a preset number of times threshold value.

2. The method for reviewing live video with stock according to claim 1, wherein the step of inputting the review requirement of the live video with stock comprises inputting in voice form and inputting in text form.

3. A review method of live video in accordance with claim 2, wherein face recognition is used to detect the anchor in live video in the live video and voice recognition is used to detect merchandise in live video in the live video.

4. A review system for live video-in-stock for implementing the method of claim 3, comprising the following modules:

The terminal module is used for inputting the review requirement of people on the live video with goods in a voice form and a text form, analyzing and obtaining a host and goods in the live video with goods which the people want to watch in the review requirement, and playing the live video with goods which meets the review requirement of the people;

A server module for storing in advance standard face feature values for identifying face images of a live broadcast room, and standard sound feature values for identifying goods from sound data of the live broadcast room, and also for identifying the live broadcast and the goods based on live broadcast video clips that people want to see, performing similarity comparison on the face feature quantity of the face image in the live broadcast live video with the standard face feature quantity, and simultaneously performing similarity comparison on the sound feature quantity of the sound data in the live broadcast live video with the standard sound feature quantity to automatically generate a corresponding live broadcast live video fragment meeting the review requirement;