CN110688498B

CN110688498B - Information processing method and device, electronic equipment and storage medium

Info

Publication number: CN110688498B
Application number: CN201910926211.0A
Authority: CN
Inventors: 邓桂林; 刘旭升; 李海
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2022-07-22
Anticipated expiration: 2039-09-27
Also published as: CN110688498A

Abstract

The embodiment of the invention provides an information processing method, an information processing device, electronic equipment and a storage medium, wherein the information processing method comprises the following steps: acquiring each video frame in the event video to be processed; performing event type identification on each video frame to obtain event type information of the event video to be processed; and identifying whether each video frame is a key content video frame or not by adopting a content identification model corresponding to the event category information, acquiring target promotion information matched with the event category information of the event video where the key content video frame is located in the candidate promotion information based on the category of the candidate promotion information if the played video frame is the key content video frame in the process of playing the to-be-processed event video, and outputting the target promotion information when the key content video frame is played. Therefore, the efficiency of selecting the insertion position of the popularization information can be improved, and the labor cost and the time overhead are reduced.

Description

Information processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to an information processing method and apparatus, an electronic device, and a storage medium.

Background

When a merchant promotes information to be promoted, another medium is usually selected as a carrier of the information to be promoted, for example, the merchant usually attaches an advertisement of a product to be promoted to programs of a television and a video website, and takes the program of the television or the video website as a carrier of the advertisement of the product to be promoted. Currently, there are various forms of inserting advertisements, such as inserting a section of advertisement video between two adjacent video frames, and for example, superimposing an advertisement on a section of video, that is, displaying the advertisement and the video simultaneously, which is also called a woundplast advertisement. Regardless of the type of advertisement insertion, an insertion position for inserting an advertisement is determined in the video in order to insert the advertisement.

However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention:

currently, the position for inserting the promotion information is usually selected manually, so that the efficiency of selecting the insertion position is very low, and a large amount of labor cost and time overhead are wasted.

Disclosure of Invention

Embodiments of the present invention provide an information processing method and apparatus, an electronic device, and a storage medium, so as to improve efficiency of selecting an insertion position of promotion information and reduce labor cost and time overhead.

The specific technical scheme is as follows:

in one aspect of the embodiments of the present invention, an embodiment of the present invention provides an information processing method, including:

acquiring each video frame in the event video to be processed;

performing event type identification on each video frame to obtain event type information of an event video to be processed;

identifying whether each video frame is a key content video frame by adopting a content identification model corresponding to the event category information, wherein the key content video frame comprises: a highlight event video frame or an event key score video frame;

in the process of playing the event video to be processed, if the played video frame is the key content video frame, based on the category of the candidate promotion information, the target promotion information matched with the event category information of the event video where the key content video frame is located is obtained from the candidate promotion information, and when the key content video frame is played, the target promotion information is output.

In another aspect of the present invention, an embodiment of the present invention further provides an information processing apparatus, including:

the video frame acquisition module is used for acquiring each video frame in the event video to be processed;

the event type identification module is used for identifying the event type of each video frame to obtain the event type information of the event video to be processed;

a content identification module, configured to identify whether each video frame is a key content video frame by using a content identification model corresponding to the event category information, where the key content video frame includes: a highlight event video frame or an event key score video frame;

and the output module is used for acquiring target promotion information matched with the event category information of the event video where the key content video frame is located in the candidate promotion information based on the category of the candidate promotion information if the played video frame is the key content video frame in the process of playing the event video to be processed, and outputting the target promotion information when the key content video frame is played.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to execute any one of the above-described information processing methods.

In yet another aspect of the present invention, the present invention further provides a computer program product including instructions, which when run on a computer, cause the computer to execute any one of the information processing methods described above.

According to the information processing method, the information processing device, the electronic equipment and the storage medium, the event type identification is carried out on each video frame, the event type information to be processed can be determined firstly, then the content identification model corresponding to the event type information can be adopted to carry out content identification on each video frame, and the content identification models corresponding to different event type information are pointed, so that the identification accuracy can be improved.

When each video frame is identified to be a key content video frame, the target promotion information matched with the event category information of the event video where the key content video frame is located can be obtained from the candidate promotion information based on the category of the candidate promotion information when the key content video frame is played, and the target promotion information is output when the key content video frame is played. Therefore, the position of the information to be promoted is not required to be manually selected and inserted, so that the efficiency of selecting the inserting position of the promoting information can be improved, and the labor cost and the time overhead are reduced. Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a first implementation of an information processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a second implementation of an information processing method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a third embodiment of an information processing method according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a fourth implementation of an information processing method according to an embodiment of the present invention;

FIG. 5 is a flowchart of a fifth implementation of an information processing method according to an embodiment of the present invention;

FIG. 6 is a flowchart of a sixth implementation of an information processing method according to an embodiment of the present invention;

FIG. 7 is a flowchart of a seventh implementation manner of an information processing method according to an embodiment of the present invention;

FIG. 8 is a flowchart of an eighth implementation of an information processing method according to an embodiment of the present invention;

FIG. 9 is a flowchart of a ninth implementation of an information processing method according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating a first exemplary embodiment of an information processing apparatus according to the present invention;

FIG. 11 is a diagram illustrating a second implementation of an information processing apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In order to solve the problems in the prior art, embodiments of the present invention provide an information processing method, an information processing apparatus, an electronic device, and a storage medium, so as to improve efficiency of selecting an insertion position of promotion information and reduce labor cost and time overhead.

In the following, an information processing method according to an embodiment of the present invention is first described, as shown in fig. 1, which is a flowchart of a first implementation manner of an information processing method according to an embodiment of the present invention, where the method may include:

s110, acquiring each video frame in the event video to be processed;

s120, performing event type identification on each video frame to obtain event type information of the event video to be processed;

s130, identifying whether each video frame is a key content video frame or not by adopting a content identification model corresponding to the event category information, wherein the key content video frame comprises: a highlight event video frame or an event key score video frame;

and S140, in the process of playing the event video to be processed, if the played video frame is the key content video frame, based on the category of the candidate promotion information, obtaining the target promotion information matched with the event category information of the event video where the key content video frame is located from the candidate promotion information, and outputting the target promotion information when the key content video frame is played.

In some examples, the pending event video may be a live event video or a shot event video for replay.

In still other examples, the staff may input a video of an ongoing event captured by the front-end capturing device to the electronic device, or may input a video of an event stored in another electronic device for replay to the electronic device. Therefore, the electronic device applying the information processing method of the embodiment of the invention can acquire the event video to be processed. And then each video frame can be obtained from the obtained to-be-processed event video.

For example, after the event video stream to be processed is acquired, the event video to be processed may be decoded, so that the video frame to be identified may be obtained.

In still other examples, the pending event video may be a sports event video, a dance event video, or a singing event video, as desired.

In some examples, because the types of games are varied, the types of event videos captured from the games are also varied, i.e., different event videos have different event categories.

In order to more accurately identify whether each video frame of the event video to be processed is a key content video frame, event category identification may be performed on each video frame so as to identify and obtain event category information of the event video to be processed.

In some examples, the event category identification may be performed on one video frame to obtain the event category information of the event video to be processed, or the event category identification may be performed on a plurality of video frames to obtain the event category information of the event video to be processed.

When event type identification is performed on a plurality of video frames, if there is a difference in event type information corresponding to each identified video frame, the event type information of the video frame having the largest number of video frames having the same event type information may be used as the type information of the event video to be processed.

For example, if it is recognized that the event category information of the video frames 1, 3 and 5 of the event video to be processed is category 1, and the event category information of the video frames 2 and 4 is category 2, the event category information of the video frames 1, 3 and 5 can be used as the category information of the event video to be processed.

After the event category information of the event video to be processed is obtained, the content identification model corresponding to the event category information of the event video to be processed is selected from a plurality of preset content identification models, so that whether each video frame is a key content video frame or not can be identified.

In some examples, each of the plurality of content recognition models corresponds to one event category information. That is, different content recognition models correspond to different event category information.

In still other examples, the content recognition model may be a recognition model in which a plurality of content templates are preset, or may be a neural network model trained in advance.

In some examples, each video frame may be input to the content recognition model corresponding to the event category information separately or may be input to the content recognition model corresponding to the event category information simultaneously.

In the embodiment of the invention, by setting the corresponding content identification models for different types of event videos, each content identification model can aim at one type of event video to be processed, so that each content identification model has higher pertinence, the identification of the key content video frames is more accurate, and the accuracy of identifying the key content video frames can be improved.

When all the video frames are not the key content video frames, other video frames can be selected from the event video to be processed again.

When at least one video frame in the video frames is a key content video frame, the at least one video frame can be used as the insertion position of the information to be promoted.

And then the electronic equipment can play the to-be-processed event video, and in the process of playing the to-be-processed event video, if the played video frame is a key content video frame, the electronic equipment can select one promotion information to output.

In some examples, a corresponding category may be set in advance for each candidate promotional information, and the electronic device may acquire the category corresponding to each candidate promotional information setting, and then match the category of the candidate promotional information with the category of the event video to be processed, so as to match target promotional information matched with the category of the event video to be processed from the candidate promotional information. Finally, the electronic device can output the target popularization information when playing the key content video frame.

Therefore, the problem that the output target popularization information is abrupt to the video frame of the key content can be avoided.

In still other examples, in addition to setting a corresponding category for each candidate promotional information in advance, the electronic device described above may further identify a category of the candidate promotional information using the promotional information category identification model, then obtain the candidate promotional information based on the category of the candidate promotional information, identify the category of the candidate promotional information using the promotional information category identification model, and output the target promotional information when playing the key content video frame.

In some examples, the targeted promotional information may include at least advertising information.

In still other examples, it is possible that the advertisement information may be output before the key content video frame, may be output after the key content video frame, or may be output simultaneously with the key content video frame in the form of a creative advertisement.

Compared with the problems that the position for inserting the promotion information is manually selected in the prior art, the labor cost is wasted, the optimal insertion position cannot be timely obtained, the optimal insertion position is easily missed and the like, when the information processing method is used in the live event, the embodiment of the invention can timely obtain the insertion position of the promotion information of the live event, so that the promotion information can be output in real time in the live event based on the insertion position of the promotion information, and meanwhile, the labor cost and the time cost are saved.

According to the information processing method provided by the embodiment of the invention, the event type information to be processed can be determined firstly by performing the event type identification on each video frame, and then the content identification model corresponding to the event type information can be adopted to perform the content identification on each video frame.

When each video frame is identified to be a key content video frame, the target promotion information matched with the event category information of the event video where the key content video frame is located can be obtained from the candidate promotion information based on the category of the candidate promotion information when the key content video frame is played, and the target promotion information is output when the key content video frame is played. Therefore, the position of inserting the information to be promoted does not need to be manually selected, so that the efficiency of selecting the inserting position of the promoting information can be improved, and the labor cost and the time overhead are reduced.

On the basis of the information processing method shown in fig. 1, a possible implementation manner is also provided in the embodiment of the present invention, which is shown in fig. 2. In fig. 2, in step S120 shown in fig. 1, the performing event type identification on each video frame to obtain the event type information of the event video to be processed may include:

and S1201, acquiring a preset event category template.

And S1202, selecting an area with the same size as the event type template in each video frame, and calculating the similarity between the content in each area and the event type template.

And S1203, when an area with the similarity to the event type template larger than a first preset similarity threshold exists, taking the event type of the event type template as the event type of the video frame corresponding to the area.

In some examples, when determining the event category of the event video to be processed, the determination may be performed in a variety of ways, and in an embodiment of the present invention, a plurality of event category templates may be preset, each event category template corresponding to an event category, then one event category template may be selected from the plurality of event category templates, and the selected event category template may be matched with each video frame to determine the event category of the event video to be processed.

In some examples, for each video frame, an area with the same size as the event category template may be selected from the video frame, and the similarity between the content in the area and the event category template may be calculated, and when the similarity is greater than the area with the first preset similarity threshold, the event category of the event category template may be used as the event category of the video frame corresponding to the area.

When the similarity is smaller than a first preset similarity threshold, it may be said that the selected area is not similar to the event category template, and therefore, the area may be moved by at least one pixel in the video frame, and an area having the same size as the event category template is reselected to be compared with the event category template. Until each pixel in the video frame is compared to the event category template.

Through setting up different event classification templates, can directly compare each video frame with different event classification templates, because when carrying out the comparison, the calculated amount is less, and then can confirm the event classification of each video frame fast to can reduce the time overhead of choosing the inserted position of popularization information, improve the efficiency of choosing the inserted position of popularization information.

In some examples, since it is troublesome to preset the event category template, the workload of the staff is relatively large, and in order to reduce the workload caused by the staff setting the event category template, on the basis of the information processing method shown in fig. 1, the embodiment of the present invention further provides a possible implementation manner, see fig. 3. In fig. 3, in step S120 shown in fig. 1, the event category identification is performed on each video frame to obtain the event category information of the event video to be processed, and the step may include:

s1204, performing character recognition on each video frame by using a first character recognition model to obtain characters in each video frame, wherein the first character recognition model is a character recognition model obtained by training samples marked with names of match events, and the characters comprise: a single word or word string;

and S1205, obtaining the event category information of the event video to be processed based on the characters in each video frame.

In some examples, for different event categories, the game screen usually has corresponding event category descriptions, for example, descriptions such as "2017 australia tennis public competition", "soldier ping game world tournament" and the like, based on which, the embodiment of the present invention may also preset a text recognition model, that is, the first text recognition model. Then, each video frame is input into the first character recognition model, when at least one video frame contains characters, the first character recognition model can output the characters in the at least one video frame, and then the match category of the to-be-processed match video can be determined based on the characters output by the first character recognition model.

In still other examples, each event category may have at least one keyword or at least one keyword; after obtaining the characters output by the first character recognition model, the electronic device may check whether the keyword or keyword including each event category includes the characters, and if so, may determine the event category of the event video to be processed as the event category of the event video.

In some examples, the first character recognition model may be a character recognition model trained using a sample labeled with a name of a match event in advance, for example, the sample may be a sample labeled with a name of "table tennis", "basketball", "badminton", and the like. The first character recognition model may be a model trained on a neural network model.

In some examples, after the first character recognition model recognizes the characters in each video frame, each character may be output one by one, or a plurality of recognized characters may be output as one character string.

In some examples, each video frame may or may not include text, and in order to avoid wasting time by using the first text recognition model to perform text recognition on a video frame that does not include text, the embodiment of the present invention may further input each video frame into a preset text detection model first to detect whether text exists in each video frame. When the text exists in at least one video frame, the text recognition can be performed on the at least one video frame by using the first text recognition model to obtain the text in the at least one video frame.

If there is a video frame containing no text, the event type of the video frame containing no text may be identified by using steps S1201 to S1203 in the second embodiment.

In the embodiment of the invention, the character recognition model is conveniently obtained by training the sample marked with the game name, so that the embodiment of the invention determines the event type information of the to-be-processed event video of each video frame by adopting the character recognition model, and does not need to set templates of different event types by workers, thereby reducing the workload of the workers.

In some examples, the video frame to be identified may include a highlight moment, for example, a highlight moment including a goal, and a celebration, for which, on the basis of an information processing method shown in fig. 1, a possible implementation manner is further provided in the embodiments of the present invention, see fig. 4. In fig. 4, the step S130 shown in fig. 1, which uses the content identification model corresponding to the event category information to identify whether each video frame is a key content video frame, may include:

s1301, acquiring a wonderful event template in a content identification model corresponding to the event category information, and calculating the similarity between the wonderful event template and each video frame, wherein the content identification model comprises a plurality of different wonderful event templates;

and S1302, determining the video frames with the similarity with the wonderful event template larger than a second preset similarity threshold value as key content video frames.

In some examples, the content recognition model corresponding to the event category information of the event video to be processed may be a model including a plurality of different highlight event templates, and the content recognition model may be preset with the plurality of different highlight event templates.

When whether each video frame is a key content video frame is identified, for each video frame, one highlight template can be selected from the plurality of highlight templates, then the similarity between the video frame and the selected highlight template is calculated, when the similarity is larger than a second preset similarity threshold value, the video frame can be determined as the key content video frame, otherwise, one highlight template is selected again, and the similarity between the video frame and the newly selected highlight template is calculated again. Until the plurality of highlight event templates are selected.

In still other examples, the similarity between the video frame and each of the highlight templates may be respectively calculated, and then it is determined whether a similarity greater than a preset similarity threshold exists among the calculated similarities, and if so, the video frame may be determined as a key content video frame. Otherwise, it can be stated that the video frame is not a key content video frame.

In still other examples, the highlight template includes at least one of: a red card template, a yellow card template, a technical statistic template, a goal template, a penalty template and the like;

in the embodiment of the invention, different wonderful event templates are set, then each video frame can be directly compared with the wonderful event templates, and because the calculation amount is small during comparison, whether each video frame is a key content video frame can be quickly determined, so that the time overhead of selecting the insertion position of the promotion information can be reduced, and the efficiency of selecting the insertion position of the promotion information is improved.

On the basis of the information processing method shown in fig. 1, an embodiment of the present invention further provides a possible implementation manner, see fig. 5. In fig. 5, the step S130 shown in fig. 1 of identifying whether each video frame is a key content video frame by using a content identification model corresponding to the event category information may include:

s1303, performing character recognition on each video frame by adopting a second character recognition model corresponding to the event category information to obtain characters in each video frame, wherein the second character recognition model is obtained by training a sample marked with wonderful event contents;

s1304, based on the characters in each video frame, determines whether each video frame is a key content video frame.

In some examples, when the highlight content appears in the game, after the highlight content of the event is shot by the front-end shooting device, a text description, for example, a text description related to technical statistics or a text description related to the first issue lineup, may be added to a video frame of the shot highlight content of the event, and then transmitted to the electronic device applying the information processing method according to the embodiment of the present invention, so that, after the electronic device receives the event video to be processed, the electronic device may use a text recognition model corresponding to the event category information of the event video to be processed to recognize whether the text is included in each video frame of the event video to be processed, that is, the second text recognition model is used to perform text recognition on each video frame.

The second character recognition model is obtained by training a sample marked with event wonderful content;

after the text in each video frame is identified, the electronic device may determine whether each video frame is a key content video frame based on the text in each video frame. If so, step S140 may be performed.

In the embodiment of the invention, the character recognition model is conveniently obtained by training the sample marked with the wonderful event content, so that the embodiment of the invention identifies the characters in each video frame by adopting the character recognition model, further determines whether each video frame is a key content video frame or not by the identified characters, does not need to set templates of different event types by workers, and can further reduce the workload of the workers.

In still other examples, when there is a video frame containing no text, the steps S1301 to S1302 in the fourth embodiment may be adopted to identify whether the video frame containing no text is a key content video frame.

On the basis of the information processing method shown in fig. 1, an embodiment of the present invention further provides a possible implementation manner, see fig. 6. In fig. 6, the step S130 shown in fig. 1, which uses the content identification model corresponding to the event category information to identify whether each video frame is a key content video frame, may include:

and S1305, identifying whether each video frame is a wonderful event video frame containing key actions by adopting an image identification model corresponding to the event category information, wherein the image identification model is obtained by training by adopting an image sample marked with the key actions.

In fig. 6, in step S140 shown in fig. 1, in the process of playing the to-be-processed event video, if the played video frame is a key content video frame, based on the category of the candidate promotion information, in the candidate promotion information, obtaining target promotion information matched with the event category information of the event video where the key content video frame is located, and when playing the key content video frame, outputting the target promotion information may include:

and S141, in the process of playing the event video to be processed, if the played video frame is a wonderful event video frame, acquiring target promotion information matched with the event category information of the event video where the key content video frame is located from the candidate promotion information based on the category of the candidate promotion information, and outputting the target promotion information when the key content video frame is played.

In some examples, event highlights may also be actions typically taken by the athlete during the game, such as a screen after the athlete has scored a goal, a screen on which to celebrate growers, or a screen on which the athlete encouraged each other, etc.

In this regard, an image recognition model corresponding to event category information of the event video to be processed may be preset to recognize whether each video frame includes a key action. In some examples, the image recognition model may be an image recognition model trained by using an image sample labeled with a relevant key action; the key actions may be: growling actions, handshaking encouraging actions, fueling themselves, etc.

When the video frame containing the key action exists, the image recognition model can recognize the key action, and then the video frame containing the key action can be used as a wonderful event video frame.

When a video frame which does not contain the key action exists, the image recognition model can output a prompt message of 'the video frame does not contain the key action' to the video frame.

Whether the video frame to be recognized contains the key actions of the athlete or not can be accurately recognized through pre-training the image recognition model, and if yes, the video frame containing the key actions can be used as a wonderful event video frame. Thus, the template is not required to be set by the staff, and the workload of the staff can be reduced.

In still other examples, one video frame may not fully indicate the highlight of the game, and a plurality of video frames may more fully indicate the highlight of the game, for example, a fast-attack basketball video in basketball, and a goal shooting video in football, which are often composed of a plurality of video frames, so that a piece of highlight video may also be used as an insertion position of the promotion information.

In this regard, the electronic device may perform consecutive key motion recognition on a video sequence including a plurality of video frames by using the image recognition model to recognize whether the video sequence including the plurality of video frames is a video sequence including consecutive key motions. When the image recognition model recognizes that the video sequence comprising a plurality of video frames comprises continuous key actions, all the video frames comprised in the video sequence can be used as key content video frames.

In still other examples, a sequential key action identification may be performed on the video sequence comprising a plurality of video frames using an I3D (Two-Stream unfolded 3D ConvNet) model.

By the embodiment of the invention, the target popularization information can be output when the video sequence is played, so that the existence time of the target popularization information can be prolonged, and the exposure time of the target popularization information can be prolonged.

In view of the above, on the basis of the information processing method shown in fig. 1, the embodiment of the present invention further provides a possible implementation manner, see fig. 7. In fig. 7, the step S130 shown in fig. 1, which uses the content identification model corresponding to the event category information to identify whether each video frame is a key content video frame, may include:

and S1306, identifying whether each video frame is a wonderful event video frame containing a target character or not by adopting a character identification model corresponding to the event type information, wherein the character identification model is obtained by training by adopting an image sample labeled with the target character in advance.

In fig. 7, in step S140 shown in fig. 1, in the process of playing the event video to be processed, if the played video frame is a key content video frame, based on the category of the candidate promotion information, in the candidate promotion information, obtaining target promotion information matched with the event category information of the event video where the key content video frame is located, and when playing the key content video frame, outputting the target promotion information, which may include:

and S141, in the process of playing the to-be-processed event video, if the played video frame is a wonderful event video frame, acquiring target promotion information matched with the event category information of the event video where the key content video frame is located from the candidate promotion information based on the category of the candidate promotion information, and outputting the target promotion information when the key content video frame is played.

In yet other possible implementations, there are usually some famous players or coaches, etc. on the playing field, and the video frames containing these persons can also be used as the video frames of the wonderful event. Therefore, in the embodiment of the present invention, the target person may be identified for each video frame by using the person identification model corresponding to the event type information of the event video. When the character recognition model recognizes that each video frame contains the target character, each video frame can be used as a wonderful event video frame.

The character recognition model is obtained by training an image sample with a target character labeled in advance. The target character may be a relatively well-known athlete or coach, etc.

By the embodiment of the invention, the range capable of being used as the video frame of the wonderful event can be increased, so that more target promotion information can be output, and the quantity of the output target promotion information is increased.

On the basis of the information processing method shown in fig. 1, a possible implementation manner is also provided in the embodiment of the present invention, which is shown in fig. 8. In fig. 8, the step S130 shown in fig. 1 of identifying whether each video frame is a key content video frame by using a content identification model corresponding to the event category information may include:

s1307, obtaining a key match score template of the match in the match identification model corresponding to the match category information, and calculating the similarity between the key match score template of the match and each video frame, wherein the match identification model comprises a plurality of key match score templates corresponding to different key scores;

and S1308, determining the video frames with the similarity to the event key score template larger than a third preset similarity threshold value as the event key score video frames.

In fig. 8, in step S140 shown in fig. 1, in the process of playing the event video to be processed, if the played video frame is a key content video frame, based on the category of the candidate promotion information, in the candidate promotion information, obtaining target promotion information matched with the event category information of the event video where the key content video frame is located, and when playing the key content video frame, outputting the target promotion information, which may include:

and S142, in the process of playing the to-be-processed event video, if the played video frame is the event key score video frame, acquiring target promotion information matched with the event category information of the event video where the key content video frame is located in the candidate promotion information based on the category of the candidate promotion information, and outputting the target promotion information when the key content video frame is played.

For example, in a table tennis game, since a score change after the score 0:10 affects a result of the game, the score 0:10 is a key score in the table tennis game, and therefore, when the video frames including the key score are played, the target promotion information can be output, which does not affect the viewing experience of the audience, and conversely, when the important video frames are played, the appropriate target promotion information is output, which can also improve the viewing experience of the audience.

In still other examples, the score setting tends to be different for different event categories, as are the key scores, e.g., 11 balls in a round for table tennis and 21 balls in a round for badminton. Thus, different score recognition models may be employed to identify video frames containing key scores for different events.

In the embodiment of the present invention, a score recognition model corresponding to the event category information of the event video to be processed may be adopted to perform score recognition on each video frame to determine the video frame including the key score.

When there is a video frame containing a key score, the video frame containing the key score may be used as the event key score video frame.

In some examples, the score recognition model may include a plurality of event key score templates, and each of the video frames may be compared with any one of the plurality of event key score templates to determine whether each of the video frames includes a key score.

In some examples, for each video frame, a similarity between the video frame and each event key score template may be calculated, and when the similarity between the video frame and a event key score template is greater than a third preset similarity threshold, the score of the event key score template may be used as the score of the video frame, and the video frame may be determined as the event key score video frame.

In some examples, the video frame to be identified may or may not include a score, and in order to avoid performing score identification on a video frame that does not include a score, the electronic device may first identify whether each video frame includes a score.

If there is a video frame that does not contain a score, the electronic device may adopt steps S1301 to 1302 in the fourth implementation manner of the embodiment of the present invention, or adopt steps S1303 to 1304 in the fifth implementation manner, to identify whether the video frame that does not contain a score is a key content video frame.

When there is a video frame containing a score, the electronic apparatus described above may perform step S1307.

Through setting up different score templates, then can directly compare each video frame with the key score template of event, because when carrying out the comparison, the calculated amount is less, and then can confirm whether to be the key score video frame of event in each video frame fast to can reduce the time cost of choosing the inserted position of popularization information.

On the basis of the information processing method shown in fig. 1, the embodiment of the present invention further provides a possible implementation manner, see fig. 9. In fig. 9, the step S130 shown in fig. 1, which uses the content identification model corresponding to the event category information to identify whether each video frame is a key content video frame, may include:

and S1309, identifying whether each video frame is a video frame of the key score of the event by adopting a third character identification model corresponding to the event category information, wherein the third character identification model is a character identification model obtained by training by adopting an image sample marked with the key score.

In fig. 9, in step S140 shown in fig. 1, in the process of playing the to-be-processed event video, if the played video frame is a key content video frame, based on the category of the candidate promotion information, in the candidate promotion information, obtaining target promotion information matched with the event category information of the event video where the key content video frame is located, and when playing the key content video frame, outputting the target promotion information may include:

In still other examples, scores usually exist in the form of characters in the pictures shot by the front-end shooting device, so the electronic device described above may perform key score recognition on each video frame by using a third character recognition model to determine whether each video frame is a game key score video frame.

The third character recognition model is obtained by training image samples marked with key scores.

After the third character recognition model recognizes and obtains the key scores of the video frames, the video frames containing the key scores can be determined as the event key score video frames. In still other examples, the third text recognition model may also output key scores in the video frames.

Through setting up the third character recognition model, the staff can only set up the sample that the training was used, then trains this third character recognition model in advance to whether contain the key score in can adopting the third character recognition model that this training obtained in the discernment each video frame, and then whether confirm each video frame is the video frame of event key score, and need not to set up the template, thereby can reduce staff's work load.

Corresponding to the above-mentioned information processing method embodiment, an embodiment of the present invention further provides an information processing apparatus, as shown in fig. 10, which is a schematic structural diagram of a first implementation manner of the information processing apparatus according to the embodiment of the present invention, and the apparatus may include:

a video frame acquiring module 1010, configured to acquire video frames in an event video to be processed;

the event type identification module 1020 is configured to perform event type identification on each video frame to obtain event type information of an event video to be processed;

a content identification module 1030, configured to identify whether each video frame is a key content video frame by using a content identification model corresponding to the event category information, where the key content video frame includes: a highlight event video frame or an event key score video frame;

the output module 1040 is configured to, in the process of playing the event video to be processed, if the played video frame is a key content video frame, obtain, in the candidate promotion information, target promotion information that matches the event category information of the event video where the key content video frame is located based on the category of the candidate promotion information, and output the target promotion information when the key content video frame is played.

According to the information processing device provided by the embodiment of the invention, the event type information to be processed can be determined firstly by performing event type identification on each video frame, and then the content identification model corresponding to the event type information can be adopted to perform content identification on each video frame.

When each video frame is identified to be a key content video frame, the target promotion information matched with the event category information of the event video where the key content video frame is located can be obtained from the candidate promotion information based on the category of the candidate promotion information when the key content video frame is played, and the target promotion information is output when the key content video frame is played. Therefore, the position of the information to be promoted is not required to be manually selected and inserted, so that the efficiency of selecting the inserting position of the promoting information can be improved, and the labor cost and the time overhead are reduced.

In some examples, the event category identification module 1020 is specifically configured to:

acquiring a preset event category template;

selecting areas with the same size as the event category template from each video frame, and calculating the similarity between the content in each area and the event category template;

and when an area with the similarity larger than a first preset similarity threshold exists, taking the event category of the event category template as the event category of the video frame corresponding to the area.

carrying out character recognition on each video frame by adopting a first character recognition model to obtain characters in each video frame, wherein the first character recognition model is a character recognition model obtained by training a sample marked with a match name, and the characters comprise: a single word or word string;

and obtaining the event category information of the event video to be processed based on the characters in each video frame.

In some examples, content identification module 1030 is specifically configured to:

acquiring a wonderful event template in a content identification model corresponding to the event category information, and calculating the similarity between the wonderful event template and each video frame, wherein the content identification model comprises a plurality of different wonderful event templates;

and determining the video frames with the similarity with the wonderful event template larger than a second preset similarity threshold value in all the video frames as key content video frames.

performing character recognition on each video frame by adopting a second character recognition model corresponding to the event category information to obtain characters in each video frame, wherein the second character recognition model is obtained by training a sample marked with wonderful contents of the event;

and judging whether each video frame is a key content video frame or not based on the characters in each video frame.

and identifying whether each video frame is a wonderful event video frame containing key actions or not by adopting an image identification model corresponding to the event category information, wherein the image identification model is obtained by training by adopting an image sample marked with the key actions.

identifying whether a video sequence containing a plurality of video frames is a video sequence containing continuous key actions or not by adopting an image identification model corresponding to the event category information;

if the video sequence contains a video sequence of consecutive key actions, all the video frames contained in the video sequence are used as key content video frames.

and identifying whether each video frame is a wonderful event video frame containing a target character or not by adopting a character identification model corresponding to the event category information, wherein the character identification model is obtained by training by adopting an image sample labeled with the target character in advance.

acquiring a key event score template in a score identification model corresponding to the event category information, and calculating the similarity between the key event score template and each video frame, wherein the score identification model comprises a plurality of key event score templates corresponding to different key scores;

and determining the video frames with the similarity with the event key score template larger than a third preset similarity threshold value as the event key score video frames in all the video frames.

and identifying whether each video frame is a video frame of the key score of the event or not by adopting a third character identification model corresponding to the event category information, wherein the third character identification model is a character identification model obtained by training by adopting an image sample marked with the key score.

On the basis of the information processing apparatus shown in fig. 10, an embodiment of the present invention further provides a possible implementation manner, as shown in fig. 11, which is a schematic structural diagram of a second implementation manner of the information processing apparatus according to the embodiment of the present invention, and the apparatus may further include:

the promotion information category identification module 1050 is used for acquiring candidate promotion information and identifying the category of the candidate promotion information by adopting a promotion information category identification model;

the output module 1040 is specifically configured to:

and in the candidate promotion information, matching the category of the candidate promotion information with the event category information of the event video where the key content video frame is located to obtain the target promotion information.

An embodiment of the present invention further provides an electronic device, as shown in fig. 12, including a processor 121, a communication interface 122, a memory 123 and a communication bus 124, where the processor 121, the communication interface 122, and the memory 123 complete mutual communication through the communication bus 124,

a memory 123 for storing a computer program;

the processor 121 is configured to implement the information processing method according to any of the above embodiments when executing the program stored in the memory 123, for example, implement the following steps:

acquiring each video frame in the event video to be processed;

According to the electronic equipment provided by the embodiment of the invention, the event category information to be processed can be determined firstly by performing event category identification on each video frame, and then the content identification model corresponding to the event category information can be adopted to perform content identification on each video frame.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to perform the information processing method described in any one of the above embodiments.

In another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer causes the computer to execute the information processing method described in any of the above embodiments.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An information processing method, characterized in that the method comprises:

acquiring each video frame in an event video to be processed;

performing event type identification on each video frame to obtain event type information of the event video to be processed;

in the process of playing the event video to be processed, if the played video frame is a key content video frame, based on the category of the candidate promotion information, obtaining target promotion information matched with the event category information of the event video where the key content video frame is located from the candidate promotion information, and outputting the target promotion information when the key content video frame is played.

2. The method according to claim 1, wherein performing event category identification on each of the video frames to obtain event category information of the to-be-processed event video comprises:

acquiring a preset event category template;

and when the similarity between the event type template and the area is larger than a first preset similarity threshold value, taking the event type of the event type template as the event type of the video frame corresponding to the area.

3. The method according to claim 1, wherein performing event category identification on each of the video frames to obtain event category information of the to-be-processed event video comprises:

performing character recognition on each video frame by using a first character recognition model to obtain characters in each video frame, wherein the first character recognition model is a character recognition model obtained by training samples marked with names of match events, and the characters comprise: a single word or word string;

4. The method of claim 1, wherein identifying whether each of the video frames is a key content video frame using a content identification model corresponding to the event category information comprises:

and determining the video frames with the similarity with the wonderful event template larger than a second preset similarity threshold value as the key content video frames in each video frame.

5. The method of claim 1, wherein identifying whether each of the video frames is a key content video frame using a content identification model corresponding to the event category information comprises:

performing character recognition on each video frame by adopting a second character recognition model corresponding to the event category information to obtain characters in each video frame, wherein the second character recognition model is obtained by training a sample marked with event wonderful content;

and judging whether each video frame is the key content video frame or not based on the characters in each video frame.

6. The method of claim 1, wherein identifying whether each of the video frames is a key content video frame using a content identification model corresponding to the event category information comprises:

7. The method of claim 6, wherein identifying whether each of the video frames is a highlight video frame containing a key action using an image recognition model corresponding to the event category information comprises:

and if the video sequence comprises a video sequence of continuous key actions, all video frames contained in the video sequence are used as the key content video frames.

8. The method of claim 1, wherein identifying whether each of the video frames is a key content video frame using a content identification model corresponding to the event category information comprises:

and identifying whether each video frame is a wonderful event video frame containing a target character or not by adopting a character identification model corresponding to the event category information, wherein the character identification model is obtained by training by adopting an image sample which is marked with the target character in advance.

9. The method of claim 1, wherein identifying whether each of the video frames is a key content video frame using a content identification model corresponding to the event category information comprises:

acquiring an event key score template in a score identification model corresponding to the event category information, and calculating the similarity between the event key score template and each video frame, wherein the score identification model comprises a plurality of event key score templates corresponding to different key scores;

and determining the video frames with the similarity with the event key score template larger than a third preset similarity threshold value as the event key score video frames in each video frame.

10. The method of claim 1, wherein identifying whether each of the video frames is a key content video frame using a content identification model corresponding to the event category information comprises:

and identifying whether each video frame is a video frame of the event key score by adopting a third character identification model corresponding to the event category information, wherein the third character identification model is a character identification model obtained by training by adopting an image sample marked with the key score.

11. The method according to claim 1, wherein before acquiring, in the candidate promotion information based on the category of the candidate promotion information, target promotion information matching event category information of an event video where the key content video frame is located, the method further comprises: acquiring candidate popularization information, and identifying the category of the candidate popularization information by adopting a popularization information category identification model;

in the candidate promotion information, obtaining target promotion information matched with the event category information of the event video where the key content video frame is located includes:

12. An information processing apparatus, characterized in that the apparatus comprises:

the event type identification module is used for carrying out event type identification on each video frame to obtain event type information of the event video to be processed;

a content identification module, configured to identify whether each of the video frames is a key content video frame by using a content identification model corresponding to the event category information, where the key content video frame includes: a highlight event video frame or an event key score video frame;

and the output module is used for acquiring target promotion information matched with the event category information of the event video where the key content video frame is located in the candidate promotion information based on the category of the candidate promotion information if the played video frame is the key content video frame in the process of playing the to-be-processed event video, and outputting the target promotion information when the key content video frame is played.

13. The apparatus according to claim 12, wherein the event category identification module is specifically configured to:

acquiring a preset event category template;

and when the similarity between the event type template and the area is larger than a first preset similarity threshold, taking the event type of the event type template as the event type of the video frame corresponding to the area.

14. The apparatus of claim 12, wherein the event category identification module is specifically configured to:

15. The apparatus of claim 12, wherein the content identification module is specifically configured to:

and determining the video frames with the similarity with the wonderful event template larger than a second preset similarity threshold value in each video frame as the key content video frames.

16. The apparatus of claim 12, wherein the content identification module is specifically configured to:

performing character recognition on each video frame by adopting a second character recognition model corresponding to the event category information to obtain characters in each video frame, wherein the second character recognition model is obtained by training a sample marked with wonderful event contents;

17. The apparatus of claim 12, wherein the content identification module is specifically configured to:

18. The apparatus of claim 17, wherein the content identification module is further configured to:

19. The apparatus of claim 12, wherein the content identification module is specifically configured to:

20. The apparatus of claim 12, wherein the content identification module is specifically configured to:

and determining the video frames with the similarity with the event key score template larger than a third preset similarity threshold value in each video frame as the event key score video frames.

21. The apparatus of claim 12, wherein the content identification module is specifically configured to:

22. The apparatus of claim 12, further comprising:

the promotion information category identification module is used for acquiring candidate promotion information and identifying the category of the candidate promotion information by adopting a promotion information category identification model;

the output module is specifically configured to:

23. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 11 when executing a program stored in the memory.

24. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 11.