CN111385642A

CN111385642A - Media information processing method, device, server, equipment and storage medium

Info

Publication number: CN111385642A
Application number: CN201811644525.3A
Authority: CN
Inventors: 高健
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-07

Abstract

In some embodiments of the present application, at least one article to be sold in a video media stream is identified, and a sliced media stream corresponding to the at least one article to be sold is selected from the video media stream, so that viewing prompt information of the sliced media stream is generated, and further, viewing prompt information of the sliced media stream is provided when a user views the video media stream, so that the media stream of the article to be sold is automatically obtained, the cost for manually obtaining the corresponding media stream is reduced, meanwhile, a media flow of the media stream is conveniently combed by the user, the user is guided to view a content that the user wants to view, especially a missed content, and a good viewing experience is provided for the user.

Description

Media information processing method, device, server, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a server, a device, and a storage medium for processing media information.

Background

Along with the development of computer, more and more users can realize numerous functions through the program of setting on the computer, for example, online shopping, online amusement, online live video etc. wherein, online live video with its characteristics such as better directly perceived nature, rapidity, richness, interactivity are strong, has obtained more and more multi-user's attention, online live video includes the live broadcast of recreation, the live broadcast of singing, live broadcast of shopping etc. nevertheless online live video is because its live broadcast flow is chaotic, can't make the user obtain the effective information in the live broadcast directly perceivedly.

Disclosure of Invention

Aspects of the present application provide a method, an apparatus, a server, a device, and a storage medium for processing media information, so as to automatically extract a media stream matching an item to be sold and provide viewing information of the media stream to a user.

The embodiment of the application provides a method for processing media information, which comprises the following steps: receiving a video media stream, and identifying at least one article to be sold in the video media stream; selecting a slice media stream corresponding to the at least one article to be sold from the video media stream; and generating the viewing prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream.

An embodiment of the present application further provides a method for processing media information, including: sending an acquisition request of a video media stream to a server; receiving the video media stream sent by the server and the watching prompt information of the slice media stream corresponding to at least one article to be sold selected from the video media stream; and displaying the video media stream and the watching prompt information.

An embodiment of the present application further provides a system for processing media information, including: a first terminal and a server; the server receives a video media stream and identifies at least one article to be sold in the video media stream; selecting a slice media stream corresponding to the at least one article to be sold from the video media stream; generating viewing prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream; the first terminal sends an acquisition request of the video media stream to a server; receiving the video media stream sent by the server and the watching prompt information of the slice media stream corresponding to at least one article to be sold selected from the video media stream; and displaying the video media stream and the watching prompt information.

An embodiment of the present application further provides a device for processing media information, including: the identification module is used for receiving a video media stream and identifying at least one article to be sold in the video media stream; the selection module is used for selecting the slice media stream corresponding to the at least one article to be sold from the video media stream; and the sending module is used for generating the watching prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream.

The embodiment of the application also provides a server, which comprises a memory, a processor and a communication component; the memory for storing a computer program; the communication component for receiving a video media stream; the processor to execute the computer program to: identifying at least one item for sale in the video media stream; selecting a slice media stream corresponding to the at least one article to be sold from the video media stream; and generating the viewing prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by one or more processors causes the one or more processors to implement the steps of the above-mentioned method.

An embodiment of the present application further provides a device for processing media information, including: the sending module is used for sending an acquisition request of the video media stream to the server; the receiving module is used for receiving the video media stream sent by the server and the watching prompt information of the slice media stream corresponding to at least one article to be sold selected from the video media stream; and the display module is used for displaying the video media stream and the watching prompt information.

The embodiment of the application also provides equipment, which comprises a memory, a processor and a communication component; the memory for storing a computer program; the communication component is used for sending an acquisition request of the video media stream to the server; receiving the video media stream sent by the server and the watching prompt information of the slice media stream corresponding to at least one article to be sold selected from the video media stream; the processor to execute the computer program to: and displaying the video media stream and the watching prompt information.

In the embodiment of the application, at least one article to be sold in a video media stream is identified, and a sliced media stream corresponding to at least one article to be sold is selected from the video media stream, so that the watching prompt information of the sliced media stream is generated, and further the watching prompt information of the sliced media stream is provided when a user watches the video media stream, so that the media stream of the article to be sold is automatically obtained, the cost for manually obtaining the corresponding media stream is reduced, the media flow of the media stream is convenient for the user to comb, the user is guided to watch the content which the user wants to watch, especially the missed content, and good watching experience is provided for the user.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a block diagram of an exemplary media information processing system according to the present application;

FIG. 2 is a flowchart illustrating a method for processing media information according to another exemplary embodiment of the present application;

fig. 3 is a schematic flowchart of another method for processing media information according to another exemplary embodiment of the present application;

FIG. 4 is a schematic view of an interface for displaying a viewing prompt according to another exemplary embodiment of the present application;

FIG. 5 is a schematic illustration of an interface for displaying an item list as taught in accordance with yet another exemplary embodiment of the present application;

FIG. 6 is a schematic illustration of an interface for playing an explained video according to another exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a processing device according to another exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of yet another exemplary processing device according to the present application;

fig. 9 is a schematic structural diagram of a server according to another exemplary embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal according to another exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the existing online shopping live broadcast, a plurality of users feed back that the online shopping live broadcast flow is disordered and random, for the same commodity, the anchor of a seller needs to continuously introduce explained commodities frequently to a buyer user who newly enters a live broadcast room, so that the anchor frequently explains the same content and the like, the whole live broadcast is not professional enough and cannot obtain the trust sense of the buyer user, most buyer users cannot efficiently and intuitively obtain the strength information of the commodity or a merchant, and bad experience is brought to the buyer users.

In some embodiments of the application, at least one article to be sold in a video media stream is identified, and a sliced media stream corresponding to the at least one article to be sold is selected from the video media stream, so that viewing prompt information of the sliced media stream is generated, and further viewing prompt information of the sliced media stream is provided when a user views the video media stream, so that the media stream of the article to be sold is automatically obtained, the cost for manually obtaining the corresponding media stream is reduced, the media flow of the media stream is conveniently combed by the user, the user is guided to view the content which the user wants to view, especially view the missed content, and good viewing experience is provided for the user.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of a media information processing system according to an exemplary embodiment of the present application. As shown in fig. 1, the processing system 100 includes: a first terminal 101, a server 102, and a second terminal 103.

The first terminal 101 may be any device with certain computing capability, for example, a smart phone, a notebook, a pc (personal computer) computer, etc. The basic structure of the first terminal 101 includes: at least one processor. The number of processors depends on the configuration and type of the first terminal 101. The first terminal 101 may also include a Memory, which may be volatile, such as RAM, or non-volatile, such as Read-Only Memory (ROM), flash Memory, etc., or may include both types. The memory typically stores an Operating System (OS), one or more application programs, and may also store program data and the like. Besides the processing unit and the memory, the first terminal 101 further includes some basic configurations, such as a network card chip, an IO bus, a camera, and an audio/video component. Optionally, the first terminal 101 may also include some peripheral devices, such as a keyboard, a mouse, a stylus, a printer, etc. Other peripheral devices are well known in the art and will not be described in detail herein.

The server 102 refers to a server capable of performing online live broadcast in a network virtual environment, and generally refers to a server capable of performing video live broadcast operation based on the internet. In physical implementation, the server 102 may be any device capable of providing computing services, responding to service requests, and performing processing, and may be, for example, a conventional server, a cloud host, a virtual center, and the like. The server mainly comprises a processor, a hard disk, a memory, a system bus and the like, and is similar to a general computer framework.

The second terminal 103 may be any device with certain computing capability, for example, a smart phone, a notebook, a pc (personal computer) computer, etc. The basic structure of the second terminal 102 includes: at least one processor. The number of processors depends on the configuration and type of the second terminal 102. The second terminal 102 may also include a Memory, which may be volatile, such as RAM, non-volatile, such as Read-Only Memory (ROM), flash Memory, etc., or may include both types. The memory typically stores an Operating System (OS), one or more application programs, and may also store program data and the like. In addition to the processing unit and the memory, the second terminal 102 also includes some basic configurations, such as a network card chip, an IO bus, and audio/video components. Optionally, the second terminal 102 may also include some peripheral devices, such as a keyboard, a mouse, a stylus, a printer, etc. Other peripheral devices are well known in the art and will not be described in detail herein.

In this embodiment, the first terminal 101 sends a video media stream to the server 102, and the server 102 identifies at least one article to be sold in the video media stream after receiving the video media stream; and selecting a slice media stream corresponding to at least one article to be sold from the video media streams. And generating the viewing prompt information of the sliced media stream according to at least one article to be sold and the corresponding sliced media stream.

Alternatively, when the server 102 receives an acquisition request of the video media stream sent by the second terminal 103, the server 102 sends the viewing prompt information of the video media stream and the slice media stream. The second terminal 103 receives the video media stream and the viewing prompt information of the slice media stream sent by the server 102, plays the video media stream and displays the viewing prompt information of the slice media stream, responds to the triggering operation of the user on the viewing prompt information, sends an acquisition request of the slice media stream to the server 102, and the server 102 acquires the corresponding slice media stream according to the identification of the slice media stream carried in the acquisition request and sends the slice media stream to the second terminal 103.

Alternatively, the number of the second terminals 103 may be plural, and each second terminal 103 may obtain the viewing prompt information of the video media stream and the slice media stream to the server 102 in different time periods of the video live broadcast.

Optionally, the server 102 updates the sliced media stream corresponding to at least one article to be sold and the viewing prompt information, and periodically sends the viewing prompt information of the updated sliced media stream to the second terminal 103 that has received the video media stream.

Alternatively, the video media stream may be a live video stream and the sliced media stream may be a recorded video stream.

In the present embodiment, the first terminal 101 is connected to the server 102 via a network, and the server 102 is connected to the second terminal 103 via a network. The first terminal 101 and the server 102 may be connected via a wireless or wired network. The second terminal 103 and the server 102 may be connected via a wireless or wired network. If the first terminal 101 is communicatively connected to the server 102 through a mobile network, and the second terminal 103 is communicatively connected to the server 102 through the mobile network, the network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), WiMax, and the like.

The process of server 102 processing media information is described in detail below in conjunction with method embodiments.

Fig. 2 is a flowchart illustrating a method for processing media information according to another exemplary embodiment of the present application. The method 200 provided by the embodiment of the present application is executed by a server, and the method 200 includes the following steps:

201: receiving a video media stream, and identifying at least one item to be sold in the video media stream.

202: and selecting a slice media stream corresponding to at least one article to be sold from the video media streams.

203: and generating the viewing prompt information of the sliced media stream according to at least one article to be sold and the corresponding sliced media stream.

The following is set forth in detail with respect to the above steps:

The video media stream refers to media data transmitted in a streaming transmission manner, and optionally, the video media stream may be a live video stream.

The article to be sold refers to an article or commodity sold by a seller in online video shopping.

Optionally, identifying at least one item for sale in the video media stream comprises: and identifying at least one article to be sold in the current video media stream according to the video images and/or video voice in the video media stream.

The method for identifying at least one article to be sold in the current video media stream according to the video image and/or the video voice in the video media stream may include the following three ways:

mode 1: identifying at least one item to be sold in the current video media stream according to the video images in the video media stream, comprising: acquiring a video image from a video media stream; reading each frame of image in the video image, and determining at least one article to be sold according to each frame of image and at least one preset article image.

For example, according to the foregoing, a director of a commodity seller performs live video broadcast through live broadcast software installed on a mobile phone, the mobile phone responds to a live video broadcast operation of the director, sends a live video stream to a server, the server receives the live video stream sent by the mobile phone of the director, obtains a video image in the live video stream, reads each frame of image in the video image, compares each frame of image with images of a plurality of preset items through an image identification method, determines a video image matched with any image of the plurality of preset items, and determines the name of the matched video image according to the name of the image of the preset item, thereby identifying the item to be sold, for example, identifying a down jacket item from the uploaded live video stream.

The image recognition method refers to a technique of processing, analyzing, and understanding an image to recognize various objects and objects in different modes.

Mode 2: identifying at least one item to be sold in the current video media stream according to the video voice in the video media stream, comprising: acquiring video voice from a video media stream; reading video voice in the video image, and determining a voice text corresponding to the video voice; and determining at least one article to be sold according to the voice text and the description information of at least one preset article.

For example, according to the foregoing, a anchor of a commodity seller performs live video broadcast through live broadcast software installed on a mobile phone, the mobile phone responds to a live video broadcast operation of the anchor, sends a live video stream to a server, the server receives the live video stream sent by the anchor mobile phone, obtains video voices in the live video stream, and converts the video voices into voice texts through a voice recognition method, such as: the method comprises the steps that a voice text 'please see the down jacket', word segmentation processing is carried out on the voice text to obtain word segmentation of each voice text, for example, 'please see', 'the' and 'the down jacket', the word segmentation is matched with description information of a plurality of preset articles, when any description information is matched, the name of the article related to the video voice is determined according to the name of the article to which the description information belongs, and for example, when the word segmentation 'the down jacket' is in the name description information of the down jacket, the video voice relates to the down jacket article.

It should be noted that the speech recognition method refers to a technique of automatically converting speech into text.

The description information refers to information describing an article to be sold, for example, for an article to be sold, the description information of the down jacket may be "name of article to be sold: the down jacket is a jacket filled with duck down filler, has a large and mellow appearance, is waterproof and windproof, is light, thin, warm-keeping and comfortable "

Mode 3: identifying at least one item to be sold in the current video media stream according to the video images and the video voice in the video media stream, comprising: acquiring the video image and the video voice from a video media stream; reading each frame of image in the video image, and determining at least one first article to be confirmed for sale according to each frame of image and at least one image of a preset article; reading video voice in the video image, and determining a voice text corresponding to the video voice; determining at least one second article to be confirmed for sale according to the voice text and the description information of at least one preset article; when the first item to be sold matches the second item to be sold, at least one item to be sold is determined.

For example, as described above, the product seller's anchor broadcasts a video through the live broadcast software installed on the mobile phone, the mobile phone responds to the video live broadcast operation of the anchor, sends a live broadcast video stream to the server, the server receives the live broadcast video stream sent by the anchor mobile phone, acquires video images and video voices in the live broadcast video stream, respectively processes the video images and the video voices, reads each frame of image in the video images, comparing each frame of image with the images of a plurality of preset articles by an image identification method, determining a video image matched with any one image of the plurality of preset articles, determining the name of the matched video image according to the name of the image of the preset article, thereby identifying the article to be sold, for example, a down jacket item is identified from the uploaded live video stream as the first item to be identified for sale.

Converting the video voice into a voice text through a voice recognition method, such as: the method comprises the steps that a voice text 'please see the down jacket', word segmentation processing is carried out on the voice text to obtain word segmentation of each voice text, for example, 'please see', 'the' and 'the down jacket', the word segmentation is matched with description information of a plurality of preset articles, when any description information is matched, the name of an article related to the video voice is determined according to the name of the article to which the description information belongs, for example, when the word segmentation 'the down jacket' is in the name description information of the down jacket, the video voice relates to the down jacket article, the down jacket article identified through the video voice is used as a second article to be confirmed to be sold, and when the first article to be confirmed to be sold is matched with the second article to be confirmed, the first article to be sold and the second article to be confirmed to be sold are both the down jacket article, and the article to be sold is determined to be the down jacket.

It should be noted that, for a live video scene, the server needs to continuously receive an uploaded live video stream until live broadcasting is finished, and when the server continuously receives the live video stream, video images and video voices in the live video stream are continuously updated, and when the server processes the video images or the video voices, the server needs to consider both a currently received video stream and, if necessary, a subsequently received video stream.

In some examples, the method 200 further comprises: and receiving the image and the description information of at least one article to be sold as the image and the description information of at least one preset article.

For example, according to the foregoing, a host of a commodity seller performs live video broadcast through live broadcast software installed on a mobile phone, before performing live video broadcast operation, an image and description information of an article to be sold need to be uploaded to a server through the live broadcast software, the live broadcast software provides an interface or an interface for uploading the image and description information, the host of the commodity seller uploads the image and description information of the article to be sold through the interface or the interface, the mobile phone responds to the uploading operation of the host and sequentially uploads the image and description information of the article to be sold to the server, the server receives the uploaded image and description information of the article to be sold and then stores the image and description information of the article to be sold into a local preset storage area, and the image and description information of the article to be sold correspond to an ID (i.e., a user ID of the live video) of the host. When the anchor uploads the live video stream through the mobile phone, the image and the description information of the article to be sold are obtained according to the user ID carried in the live video stream.

The slice media stream refers to media data transmitted in a streaming transmission manner, and optionally, the slice media stream may be a recorded broadcast video stream.

Optionally, selecting a sliced media stream corresponding to at least one article to be sold from the video media streams, including: and cutting the video media stream according to the time when the article to be sold appears in the video media stream, and generating a media stream slice corresponding to the article to be sold as a slice media stream.

For example, as previously described, upon identifying an item to be sold in a live video stream: and after the down jacket is worn, determining the time of the down jacket appearing in the live video stream, cutting the live video stream corresponding to the down jacket according to the time to obtain a video slice, storing the video slice, and generating a recorded broadcast video stream as a slice media stream.

It should be noted that, when receiving the live video stream, in order to provide the live video stream to the user conveniently, the live video stream may be stored to generate a recorded video, and the recorded video is processed to generate a corresponding video slice, and the corresponding video slice is stored, and at the same time, an item name, a playing time, and an item price corresponding to the video slice are stored.

After the video slice is stored, the watching prompt information of the video slice needs to be generated according to the article name 'down jacket', the playing time '00: 10: 22' and the article price '600 yuan' corresponding to the video slice.

Optionally, according to the time when the article to be sold appears in the video media stream, the manner of cutting the video media stream may include the following:

mode 1: cutting the video media stream according to the time when the article to be sold appears in the video media stream, comprising: acquiring a video image of a video media stream; determining a first time when the article to be sold appears for the first time and a second time when the article to be sold appears for the last time in the video image; and cutting the video media stream according to the first time and the second time.

For example, according to the foregoing, after receiving a live video stream sent by a mobile phone of a main broadcast, a server acquires video images of the live video stream, analyzes each frame of the video images, compares the video images with images of preset articles by an image recognition method, finds out a video image in which a down jacket appears for the first time in the video images, uses the time corresponding to the video image, such as the broadcast time 00:10:22, as the first time, compares the video images with the images of the preset articles according to the image recognition method, finds out a video image in which a down jacket appears for the next time in the video images, uses the time corresponding to the video image, such as the broadcast time 00:14:45, as the second time, cuts the video stream according to the two times, obtains a video slice, and uses the video slice as a recorded and broadcast video of the down jacket.

Mode 2: according to the time that the article to be sold appears in the video media stream, the video media stream is cut, and the method comprises the following steps: acquiring video voice of a video media stream; determining a first time in the video speech relating to the item to be sold for the first time and a second time relating to the item to be sold for the last time; and cutting the video media stream according to the first time and the second time.

For example, according to the foregoing, after receiving a live video stream sent by a mobile phone of a main broadcast, a server obtains video speech of the live video stream, converts the video speech into a speech text according to a speech recognition method, performs word segmentation processing on the speech text, compares the word segmentation with description information of a preset article, when the speech text first shows a word segmentation of a down jacket, may use the time when the down jacket appears as a first time, continue the comparison, use the time when the down jacket appears last as a second time, cuts the video stream according to the two times, obtains a video slice, and uses the video slice as a recorded video of the down jacket.

Optionally, determining a first time in the video speech that first relates to the item for sale and a second time that last relates to the item for sale comprises: determining a voice text according to the video voice, and analyzing the semantic meaning of the voice text; determining a first time at which semantics associated with an item to be sold first occur; a second time at which semantics associated with the item to be sold last occurred is determined.

The semantic meaning related to the article to be sold refers to semantic meaning related to the material, color, shape and function of the article to be sold. For example, for a down jacket, the semantics associated with it may include garment warmth, liner filled with duck down, liner filled with goose down, fashion of the garment, and so forth.

It should be noted that, associated semantics of a plurality of items are preset in a storage area in the server, and after the item to be sold is identified from the video media stream, the server may acquire the associated semantics corresponding to the item to be sold from the storage area.

For example, according to the foregoing, after receiving a live video stream sent by a mobile phone of a main broadcast, a server obtains video speech of the live video stream, converts the video speech into a speech text according to a speech recognition method, performs word segmentation processing on the speech text, performs semantic parsing on words and the speech text according to a natural language understanding method, so that the server knows the specific meaning of the semantic text, and when the definitions or meanings of the speech text obtained by the server relate to the associated semantics of a down jacket, may use the time at which the associated semantics of the down jacket appear for the first time as a first time, use the time at which the associated semantics of the down jacket appear for the last time as a second time, cut the video stream according to the two times, obtain a video slice, and use the video slice as a recorded broadcast video of the down jacket.

It should be noted that the natural language understanding method is a method for realizing natural language communication between human and machine through natural language of human society, such as chinese and english, to replace part of mental labor of human, including processing and processing of query data, answering questions, extracting documents, compiling data and all information related to natural language.

Optionally, the video media stream comprises at least one item sales scene, each item sales scene is used for introducing one item to be sold; wherein determining a first time in the video speech that first relates to the item for sale and a second time that last relates to the item for sale comprises: determining the voice text according to the video voice, and analyzing the semantic meaning of the voice text; acquiring preset starting semantics and preset ending semantics of a corresponding article sales scene; when the analyzed first semantic meaning is matched with the preset starting semantic meaning, the time corresponding to the analyzed first semantic meaning is used as the first time related to the article to be sold for the first time; and after the first time is determined, when the analyzed second semantic meaning is matched with the preset finishing semantic meaning, the time corresponding to the analyzed second semantic meaning is taken as the second time related to the article to be sold at the last time.

Optionally, the item sales scene refers to a video scene selling or introducing an item to be sold. For example, a sales scenario may be presented or sold for down jackets, and a sales scenario may be presented or sold for children's scissors.

It should be noted that, the storage area in the server is preset with the start semantics and the end semantics of the article sales scene, and after the article to be sold is identified from the video media stream, the server may acquire the start semantics and the end semantics of the article sales scene from the storage area.

Where begin semantics refers to semantics of beginning to introduce or sell an item for sale, e.g., next we introduce or display … …, or next clothing … …, or let us look at next, etc.

The ending semantic meaning is a voice for ending the introduction or sale of the article to be sold, for example, the introduction is completed for the article of clothing, or a favorite buddy can go to purchase, etc., or: next we introduce or show … …, or the next garment is … …, or let us look at the next, etc.

Although the video stream is cut according to the start semantic meaning and the end semantic meaning of any article sales scene in the present embodiment, it may be understood that the video stream is cut according to the start semantic meaning and the end semantic meaning of any article sales scene and the start semantic meaning of the next article sales scene.

For example, as described above, after receiving a live video stream sent by a mobile phone of a host, a server identifies an article to be sold, "down jacket", acquires video speech of the live video stream, converts the video speech into a speech text according to a speech identification method, performs word segmentation processing on the speech text, performs semantic parsing on the word processing and the speech text according to a natural language understanding method so that the server knows the specific meaning of the semantic text, and when the server acquires that the definition or meaning of a sentence in the speech text matches the preset starting semantic of the article sale scene and the time corresponding to the semantic of the sentence is before the time when the down jacket appears and no other preset starting semantic sentences appear in the time period, takes the time corresponding to the semantic sentence as a first time, and when the server acquires that the definition or meaning of the sentence in the speech text matches the preset semantic ending of the article sale scene, and the time corresponding to the semantics of the sentence is after the time when the down jacket appears, and no other preset finishing semantics appear in the time period, taking the time corresponding to the semantics of the sentence as a second time, cutting the video stream according to the two times to obtain a video slice, and taking the video slice as the recorded and broadcast video of the down jacket.

Mode 3: according to the time that the article to be sold appears in the video media stream, the video media stream is cut, and the method comprises the following steps: acquiring a video image and video voice of a video media stream; determining a first image time of a first occurrence of an item to be sold and a second image time of a last occurrence of the item to be sold in the video image; determining a first voice time relating to the article to be sold for the first time and a second voice time relating to the article to be sold for the last time in the video voice; selecting a time point in front of the time from the first image time and the first voice time as a first time; selecting a later time point from the second image time and the second voice time as a second time; and cutting the video media stream according to the first time and the second time.

It should be noted that, as described above, this embodiment may further determine, according to the video image, a time when the article to be sold appears for the first time and a time when the article to be sold appears for the last time, determine, according to the video speech, a time when the article to be sold appears for the first time and a time when the article to be sold appears for the last time, select, from these times, the earliest time as the first time and the latest time as the second time, cut the video stream according to these two times, obtain a video slice, and use this video slice as a recorded video of the down jacket.

The specific implementation manners of determining the time of the first occurrence of the article to be sold and the time of the last occurrence of the article to be sold according to the video images, and determining the time of the first occurrence of the article to be sold and the time of the last occurrence of the article to be sold according to the video voices have been described in detail in the foregoing, and will not be described herein again.

In addition, as can be seen from the foregoing, the video stream can be cut in the above-mentioned various manners, and besides, the video stream can be cut in the manner of splitting and combining, for example, the video image can be identified, the time when the article to be sold appears for the first time is determined as the first time, the time when the article to be sold is related to the last time is determined as the second time by the video voice identification, and so on, and a plurality of manners can be generated to cut the video stream. And the cutting mode is more flexible and accurate.

To ensure that the cut video stream is more accurate and better viewed, in some examples, the method 200 further comprises: determining first fault-tolerant time according to preset fault-tolerant time and the first time; determining second fault-tolerant time according to the preset fault-tolerant time and the second time; and cutting the video media stream according to the first fault-tolerant time and the second fault-tolerant time.

Here, the preset fault-tolerant time refers to an extra time, for example, 40s, reserved to prevent a cutting time error occurring when cutting the video media stream.

For example, as described above, when the cut video stream of the down jacket is determined to be 00:11:02-00:14:01, the first time: 00:11:02 advances for 40s, the first fault-tolerant time is 00:10:22, and the second time: 00:14:05, push back to 40s, then the second fault tolerance time is 00:14: 45. and cutting the video stream according to the two fault-tolerant time to obtain a video slice, and taking the video slice as a recorded and broadcast video of the down jacket.

In some examples, the method 200 further comprises: and correspondingly storing the watching prompt information and the identification of the video media stream so as to send the watching prompt information when sending the video media stream.

The viewing prompt information is prompt information for guiding the user to view the explained video.

For example, as described above, the server stores the slice media stream and an identification of the corresponding video media stream, such as the name "down jacket" of the item to be sold.

In some examples, the method 200 further comprises: storing the identification of the article to be sold and the corresponding slice media stream; receiving an acquisition request of a video media stream, and sending watching prompt information of the video media stream and a slice media stream; and receiving an acquisition request of the sliced media stream, and acquiring and returning the sliced media stream corresponding to the identifier according to the identifier of the article to be sold carried in the acquisition request so as to play the sliced media stream.

Optionally, receiving an acquisition request of the video media stream, and sending viewing prompt information of the video media stream and the slice media stream, includes: and sending the watching prompt information of the current live video stream and the selected recorded and broadcast video stream.

For example, as described above, the server stores the corresponding identifier of the item to be sold when storing the recorded video stream, such as the name "down jacket" of the article to be sold, the buyer's user watches the live video through the live broadcasting software installed on the mobile phone, the mobile phone responds to the operation of watching the live broadcast of the buyer's user, sends an acquisition request to the server, the acquisition request carries the ID of the live broadcast room, after receiving the acquisition request, the server acquires the corresponding live broadcast video stream according to the ID of the live broadcast room, acquiring the watching prompt information of the corresponding selected recorded and broadcast video stream according to the identification of the live video stream, sending the live video stream and the watching prompt information to the mobile phone of the buyer by the server, and displaying to enable the user of the buyer to watch the live video, and simultaneously, the user of the buyer can view the watching prompt information. Fig. 4 shows an interface 400 showing the viewing hints, in which interface 400 are shown the live room ID, the name of the seller, the viewing hints "explained 3 live items, click me to watch playback", and the live content. After the buyer user sees the interface 400, the buyer user clicks the viewing prompt message, fig. 5 shows an interface 500 for displaying the list of the explained commodities, the user selects one of the explained commodities, such as a down jacket, according to the list of the explained commodities displayed on the interface 500, the mobile phone of the buyer user responds to the viewing operation and sends an acquisition request to the server, the acquisition request carries the name of the commodity to be sold, the server receives the acquisition request, acquires the corresponding recorded and broadcast video stream according to the name of the commodity to be sold, and the server responds to the acquisition request, sends the recorded and broadcast video stream to the mobile phone of the buyer user and plays the recorded and broadcast video. Fig. 6 shows an interface 600 for playing the explained video, in which the mobile phone of the buyer user can display the playing interface of the recorded and played video, and the mobile phone of the buyer user starts playing the recorded and played video in response to the watching operation of the user.

It should be noted that when the recorded and played video is played, the live video may be stopped from being played, and after the recorded and played video is played, the live video continues to be played, but the live video may be the current latest live video, or may be a cache video read from a mobile phone cache, so as to continue to play the live video from the time point of stopping.

Furthermore, when storing the sliced media stream, it is necessary to associate the sliced media stream with the related information of the video media stream (e.g., the user ID of the live video, the identification of the video media stream, the live room ID), and the viewing prompt information, so that the user of the buyer can obtain the sliced media stream through this association when obtaining the sliced media stream.

It should be understood that all references to the establishment of the above relationships are intended to be within the scope of the embodiments of the present application.

In some examples, the method 200 further comprises: and regularly sending the updated watching prompt information of the slice media stream to the equipment which has received the video media stream.

For example, the server may continuously receive a live video stream uploaded by a mobile phone hosted by the seller, generate new viewing prompt information along with the received live video stream, combine the new viewing prompt information and the generated viewing prompt information into updated viewing prompt information, and send the updated viewing prompt information to the mobile phone that has sent the live video stream every 5 minutes.

It should be noted that the embodiments of the present application can also be applied to an application scenario of recorded and played video.

The following describes the technical solution of the present application in detail with reference to several exemplary application scenarios:

scene 1: in the live video application scene, seller's anchor carries out the live video through the live APP of cell-phone installation, and before live, seller's anchor sends the picture and the description information of many clothes that will live the sale to live platform through the interface that live APP provided, if, down jacket image, description information: name of article to be sold: the down jacket is a jacket filled with duck down filler, has a large and mellow appearance, is waterproof and windproof, is light, thin, warm and comfortable, and has a price of 600 yuan. Child skirt image, description information: name of article to be sold: the child skirt is made of pure cotton, is soft in fabric, breathable and comfortable, and is 100 yuan in price; and so on to sell the garment. The server of the live platform receives the information, stores the information in a local storage area, and associates the information with the ID of the anchor of the seller. The anchor of the seller sends a live broadcast video stream to the server through a live broadcast APP installed on the mobile phone, the server receives the live broadcast video stream, acquires video images in the live broadcast video stream, performs image recognition on each frame of image, when the identified image matches the image of the preset live-sold clothing, the specific name of the article to be sold, the down jacket, when determining the image of the down jacket appearing for the first time and the image of the down jacket appearing for the last time according to the video images, and determines the time of the two images, 00:10:22 and 00:14:45, the live video stream can be stored, and the stored live video stream can be cut according to the two times to obtain video slices between the two times, which are used as video slices of the down jackets and stored in a local storage area, and simultaneously, the video slice corresponds to the name of the clothing to be sold, "down jackets" and the ID of the live video stream. And generating the watching prompt information of the video slice according to the playing time, the name of the clothes to be sold, the price and the like, and storing the watching prompt information correspondingly.

The server continues to receive the live video stream, can also acquire video voice in the live video stream, performs voice recognition on the video voice, recognizes the specific name of an article to be sold when the recognized voice text is matched with the name text of clothes preset for live sale, and recognizes the starting semantic of the sales scene of the child skirt in the voice text, namely ' we see the child skirt next ' and the ending semantic ' that the child skirt is introduced into the place ' according to a natural language understanding method, and determines that the time of the two voices is ' 00: 15: 01 ' and ' 00:19:42 ', and video cutting is carried out on the stored live video stream according to the two times to obtain a video slice between the two times, the video slice is used as a video slice of a child skirt and is stored in a local storage area, and meanwhile, the video slice corresponds to the name ' child skirt ' of the clothes to be sold and the ID of the live video stream. And generating watching prompt information of the video slice according to the playing time, the name of the clothes to be sold, child skirt, the price and the like, and storing the watching prompt information correspondingly.

A buyer user watches the live video through a live APP installed on a mobile phone, a server receives an acquisition request sent by the mobile phone of the buyer user, the acquisition request can carry a live room ID, the server acquires a current live video stream corresponding to the live room ID and watching prompt information of at least one video slice corresponding to the live video stream according to the live room ID, the server responds to the acquisition request and sends the live video stream and the watching prompt information to the mobile phone of the buyer user, the mobile phone of the buyer user plays and displays the watching prompt information after receiving the live video stream, when the user wants to watch explained clothes in the live process, the watching prompt information is clicked, the mobile phone responds to the clicking operation and displays list video slice information of the explained clothes, the buyer user clicks the video slice of the down jacket to watch, the mobile phone responds to the watching operation, sends an obtaining request carrying the name of the clothes, namely the down jacket, to the server, after receiving the obtaining request, the server obtains a corresponding video slice according to the name of the clothes, namely the down jacket, carried by the obtaining request, and returns the video slice to the mobile phone, after receiving the video slice, the mobile phone can directly play the video slice, or respond to the clicking operation of a user to play the video slice, and simultaneously pause the live video, and after the video slice is played, the current live video is played.

Fig. 3 is a flowchart illustrating a method for processing media information according to another exemplary embodiment of the present application. The method 300 provided by the embodiment of the present application is executed by a terminal, and the method 300 includes the following steps:

301: sending an acquisition request of a video media stream to a server;

302: receiving a video media stream sent by a server and watching prompt information of a slice media stream corresponding to at least one article to be sold selected from the video media stream;

303: and displaying the video media stream and watching prompt information.

In some examples, the method 300 further comprises: responding to the trigger operation of watching prompt information, and sending an acquisition request of the slice media stream to a server; and receiving the corresponding slice media stream sent by the server, and displaying the corresponding slice media stream.

It should be noted that the specific implementation of the method 300 provided in the foregoing embodiments has been described in detail, and thus, the detailed description is omitted here.

Fig. 7 is a schematic structural framework diagram of a media information processing apparatus according to another exemplary embodiment of the present application. The apparatus 700 may be applied in a server, and the apparatus 700 includes an identification module 701, a selection module 702, and a generation module 703, and the functions of the modules are described in detail below:

the identifying module 701 is configured to receive the video media stream, and identify at least one article to be sold in the video media stream.

A selecting module 702, configured to select a slice media stream corresponding to at least one article to be sold from the video media streams.

The generating module 703 is configured to generate viewing prompt information of the sliced media stream according to at least one article to be sold and the corresponding sliced media stream.

In some examples, the apparatus 700 further comprises: the storage module is used for storing the identification of the article to be sold and the corresponding slice media stream; the first receiving module is used for receiving an acquisition request of a video media stream, sending the video media stream and the watching prompt information of the slice media stream, receiving the acquisition request of the slice media stream, and acquiring and returning the slice media stream corresponding to the identification according to the identification of the article to be sold carried in the acquisition request so as to play the slice media stream.

Optionally, the identifying module 701 is specifically configured to identify at least one item to be sold in the current video media stream according to a video image and/or a video voice in the video media stream.

Optionally, the identifying module 701 includes: the first acquisition unit is used for acquiring a video image from a video media stream; and the first determining unit is used for reading each frame of image in the video image and determining at least one article to be sold according to each frame of image and at least one preset article image.

Optionally, the identifying module 701 includes: the second acquisition unit is used for acquiring video voice from the video media stream; and the second determining unit is used for reading the video voice in the video image, determining a voice text corresponding to the video voice, and determining at least one article to be sold according to the voice text and the description information of at least one preset article.

Optionally, the identifying module 701 includes: the third acquisition unit is used for acquiring video images and video voice from the video media stream; the third determining unit is used for reading each frame of image in the video image and determining at least one first article to be confirmed for sale according to each frame of image and at least one image of a preset article; reading video voice in the video image, and determining a voice text corresponding to the video voice; determining at least one second article to be confirmed for sale according to the voice text and the description information of at least one preset article; when the first item to be sold matches the second item to be sold, at least one item to be sold is determined.

In some examples, the apparatus 700 further comprises: and the second receiving module is used for receiving the image and the description information of at least one article to be sold as the image and the description information of at least one preset article.

Optionally, the selecting module 702 is specifically configured to: and cutting the video media stream according to the time when the article to be sold appears in the video media stream, and generating a media stream slice corresponding to the article to be sold as the slice media stream.

Optionally, the selecting module 702 includes: a fourth obtaining unit, configured to obtain a video image of the video media stream; the fourth determining unit is used for determining a first time when the article to be sold appears for the first time and a second time when the article to be sold appears for the last time in the video image; and the first cutting unit is used for cutting the video media stream according to the first time and the second time.

Optionally, the selecting module 702 includes: a fifth obtaining unit, configured to obtain video and speech of the video media stream; a fifth determining unit, for determining a first time related to the article to be sold for the first time and a second time related to the article to be sold for the last time in the video voice; and the second cutting unit is used for cutting the video media stream according to the first time and the second time.

Optionally, the fifth determining unit is specifically configured to: determining a voice text according to the video voice, and analyzing the semantic meaning of the voice text; determining a first time at which semantics associated with an item to be sold first occur; a second time at which semantics associated with the item to be sold last occurred is determined.

Optionally, the video media stream comprises at least one item sales scene, each item sales scene is used for introducing one item to be sold; the fifth determining unit is specifically configured to: determining a voice text according to the video voice, and analyzing the semantic meaning of the voice text; acquiring preset starting semantics and preset ending semantics of a corresponding article sales scene; when the analyzed first semantic meaning is matched with the preset starting semantic meaning, the time corresponding to the analyzed first semantic meaning is used as the first time related to the article to be sold for the first time; and after the first time is determined, when the analyzed second semantic meaning is matched with the preset finishing semantic meaning, the time corresponding to the analyzed second semantic meaning is taken as the second time related to the article to be sold at the last time.

Optionally, the selecting module 702 includes: a sixth obtaining unit, configured to obtain a video image and a video voice of the video media stream; a sixth determining unit, configured to determine a first image time when the article to be sold appears for the first time and a second image time when the article to be sold appears for the last time in the video image; determining a first voice time relating to an article to be sold for the first time and a second voice time relating to the article to be sold for the last time in video voice; the selection unit is used for selecting a time point before the time from the first image time and the first voice time as a first time; selecting a time point with later time from the second image time and the second voice time as a second time; and the third cutting unit is used for cutting the video media stream according to the first time and the second time.

In some examples, the apparatus 700 further comprises: the first determining module is used for determining first fault-tolerant time according to preset fault-tolerant time and the first time; the second determining module is used for determining second fault-tolerant time according to the preset fault-tolerant time and the second time; and the cutting module is used for cutting the video media stream according to the first fault-tolerant time and the second fault-tolerant time.

In some examples, the apparatus 700 further comprises: and the updating module is used for sending the updated watching prompt information of the slice media stream to the equipment which has received the video media stream at regular time.

Optionally, the video media stream is a live video stream, and the slice media stream is a recorded broadcast video stream; the first receiving module is used for sending the watching prompt information of the current live video stream and the selected recorded and broadcast video stream.

In some examples, the apparatus 700 further comprises: and the storage module is used for correspondingly storing the watching prompt information and the identification of the video media stream so as to send the watching prompt information when the video media stream is sent.

Fig. 8 is a schematic structural framework diagram of yet another media information processing apparatus according to yet another exemplary embodiment of the present application. The apparatus 800 may be applied to a terminal, and the apparatus 800 includes: the sending module 801, the receiving module 802 and the displaying module 803, the functions of which are described in detail below:

a sending module 801, configured to send an acquisition request of a video media stream to a server.

The receiving module 802 is configured to receive a video media stream sent by a server and viewing prompt information of a slice media stream corresponding to at least one article to be sold selected from the video media stream.

And a display module 803, configured to display the video media stream and view the prompt information.

In some examples, the sending module 801 is configured to send, to the server, an acquisition request of the sliced media stream in response to a trigger operation of viewing the hint information; a receiving module 802, configured to receive a corresponding sliced media stream sent by a server, and display the corresponding sliced media stream.

Having described the internal functions and structure of the processing device 700 shown in fig. 7, in one possible design, the structure of the processing device 700 shown in fig. 7 may be implemented as a server, as shown in fig. 9, and the server 900 may include: memory 901, processor 902, and communications component 903;

a memory 901 for storing a computer program;

a communication component 903 for receiving a video media stream;

a processor 902 for executing a computer program for: identifying at least one item for sale in the video media stream; selecting at least one slice media stream corresponding to an article to be sold from the video media streams; and generating the viewing prompt information of the sliced media stream according to at least one article to be sold and the corresponding sliced media stream.

In some examples, the processor 902 is further configured to: storing the identification of the article to be sold and the corresponding slice media stream; receiving an acquisition request of a video media stream, and sending the video media stream and the watching prompt information of the slice media stream; and receiving an acquisition request of the sliced media stream, and acquiring and returning the sliced media stream corresponding to the identifier according to the identifier of the article to be sold carried in the acquisition request so as to play the sliced media stream.

Optionally, the processor 902 is specifically configured to identify at least one item to be sold in the current video media stream according to a video image and/or a video voice in the video media stream.

Optionally, the processor 902 is specifically configured to obtain a video image from a video media stream; reading each frame of image in the video image, and determining at least one article to be sold according to each frame of image and at least one preset article image.

Optionally, the processor 902 is specifically configured to obtain video and voice from a video media stream; and reading the video voice in the video image, determining a voice text corresponding to the video voice, and determining at least one article to be sold according to the voice text and the description information of at least one preset article.

Optionally, the processor 902 is specifically configured to obtain a video image and a video voice from a video media stream; reading each frame of image in the video image, and determining at least one first article to be confirmed for sale according to each frame of image and at least one image of a preset article; reading video voice in the video image, and determining a voice text corresponding to the video voice; determining at least one second article to be confirmed for sale according to the voice text and the description information of at least one preset article; when the first item to be sold matches the second item to be sold, at least one item to be sold is determined.

In some examples, the processor 902 is further configured to: and receiving the image and the description information of at least one article to be sold as the image and the description information of at least one preset article.

Optionally, the processor 902 is specifically configured to: and cutting the video media stream according to the time when the article to be sold appears in the video media stream, and generating a media stream slice corresponding to the article to be sold as the slice media stream.

Optionally, the processor 902 is specifically configured to obtain a video image of the video media stream; determining a first time when the article to be sold appears for the first time and a second time when the article to be sold appears for the last time in the video image; and cutting the video media stream according to the first time and the second time.

Optionally, the processor 902 is specifically configured to obtain video and speech of the video media stream; determining a first time in the video speech relating to the item to be sold for the first time and a second time relating to the item to be sold for the last time; and cutting the video media stream according to the first time and the second time.

Optionally, the processor 902 is specifically configured to determine a voice text according to the video speech, and parse semantics of the voice text; determining a first time at which semantics associated with an item to be sold first occur; a second time at which semantics associated with the item to be sold last occurred is determined.

Optionally, the video media stream comprises at least one item sales scene, each item sales scene is used for introducing one item to be sold; the processor 902 is specifically configured to determine a voice text according to the video voice and analyze semantics of the voice text; acquiring preset starting semantics and preset ending semantics of a corresponding article sales scene; when the analyzed first semantic meaning is matched with the preset starting semantic meaning, the time corresponding to the analyzed first semantic meaning is used as the first time related to the article to be sold for the first time; and after the first time is determined, when the analyzed second semantic meaning is matched with the preset finishing semantic meaning, the time corresponding to the analyzed second semantic meaning is taken as the second time related to the article to be sold at the last time.

Optionally, the processor 902 is specifically configured to obtain a video image and a video voice of the video media stream; determining a first image time of a first occurrence of an item to be sold and a second image time of a last occurrence of the item to be sold in the video image; determining a first voice time relating to an article to be sold for the first time and a second voice time relating to the article to be sold for the last time in video voice; selecting a time point in front of the time from the first image time and the first voice time as a first time; selecting a time point with later time from the second image time and the second voice time as a second time; and cutting the video media stream according to the first time and the second time.

In some examples, the processor 902 is further configured to: determining first fault-tolerant time according to preset fault-tolerant time and the first time; determining second fault-tolerant time according to the preset fault-tolerant time and the second time; and cutting the video media stream according to the first fault-tolerant time and the second fault-tolerant time.

In some examples, the processor 902 is further configured to: and regularly sending the updated viewing prompt information of the slice media stream to the equipment which has received the video media stream.

Optionally, the video media stream is a live video stream, and the slice media stream is a recorded broadcast video stream; the processor 902 is specifically configured to: and sending the watching prompt information of the current live video stream and the selected recorded and broadcast video stream.

In some examples, the processor 902 is further configured to: and correspondingly storing the watching prompt information and the identification of the video media stream so as to send the watching prompt information when sending the video media stream.

In addition, embodiments of the present invention provide a computer storage medium, and the computer program, when executed by one or more processors, causes the one or more processors to implement the steps of the method for processing media information in the method embodiment of fig. 2.

Having described the internal functions and structure of the processing device 800 shown in fig. 8, in one possible design, the structure of the processing device 800 shown in fig. 8 may be implemented as a terminal, as shown in fig. 10, and the terminal 1000 may include: memory 1001, processor 1002, and communications component 1003;

a memory 1001 for storing a computer program;

a communication component 1003 for sending an acquisition request of the video media stream to the server; receiving a video media stream sent by a server and watching prompt information of a slice media stream corresponding to at least one article to be sold selected from the video media stream;

a processor 1002 for executing a computer program for: and displaying the video media stream and watching prompt information.

In some examples, the processor 1002 is further configured to: responding to the trigger operation of watching prompt information, and sending an acquisition request of the slice media stream to a server; and receiving the corresponding slice media stream sent by the server, and displaying the corresponding slice media stream.

In addition, embodiments of the present invention provide a computer storage medium, and the computer program, when executed by one or more processors, causes the one or more processors to implement the steps of the method for processing media information in the method embodiment of fig. 3.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 201, 202, 203, etc., are merely used for distinguishing different operations, and the sequence numbers themselves do not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable multimedia data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable multimedia data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable multimedia data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable multimedia data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for processing media information, comprising:

receiving a video media stream, and identifying at least one article to be sold in the video media stream;

selecting a slice media stream corresponding to the at least one article to be sold from the video media stream;

and generating the viewing prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream.

2. The method of claim 1, further comprising:

storing the identification of the article to be sold and the corresponding slice media stream;

receiving an acquisition request of the video media stream, and sending the video media stream and the watching prompt information of the slice media stream;

and receiving an acquisition request of the sliced media stream, and acquiring and returning the sliced media stream corresponding to the identifier according to the identifier of the article to be sold carried in the acquisition request so as to play the sliced media stream.

3. The method of claim 1, wherein the identifying at least one item for sale in the video media stream comprises:

and identifying at least one article to be sold in the current video media stream according to the video image and/or the video voice in the video media stream.

4. The method of claim 3, wherein identifying at least one item for sale in a current video media stream based on video images in the video media stream comprises:

acquiring the video image from the video media stream;

reading each frame of image in the video image, and determining the at least one article to be sold according to each frame of image and the image of the at least one preset article.

5. The method of claim 3, wherein identifying at least one item for sale in a current video media stream based on video speech in the video media stream comprises:

acquiring the video voice from the video media stream;

reading video voice in the video image, and determining a voice text corresponding to the video voice;

and determining the at least one article to be sold according to the voice text and the description information of the at least one preset article.

6. The method of claim 3, wherein identifying at least one item for sale in a current video media stream based on video images and video speech in the video media stream comprises:

acquiring the video image and the video voice from the video media stream;

reading each frame of image in the video image, and determining at least one first article to be confirmed for sale according to each frame of image and at least one image of a preset article;

determining at least one second article to be confirmed for sale according to the voice text and the description information of at least one preset article;

determining the at least one item to be sold when the first item to be sold matches the second item to be sold.

7. The method according to any one of claims 4-6, further comprising:

and receiving the image and the description information of the at least one article to be sold as the image and the description information of the at least one preset article.

8. The method of claim 1, wherein the selecting the sliced media stream corresponding to the at least one item to be sold from the video media stream comprises:

and cutting the video media stream according to the time when the article to be sold appears in the video media stream, and generating a media stream slice corresponding to the article to be sold as the slice media stream.

9. The method of claim 8, wherein said slicing the video media stream according to the time the item to be sold appeared in the video media stream comprises:

acquiring a video image of the video media stream;

determining a first time at which the item to be sold appears for the first time and a second time at which the item to be sold appears for the last time in the video image;

and cutting the video media stream according to the first time and the second time.

10. The method of claim 8, wherein said slicing the video media stream according to the time the item to be sold appeared in the video media stream comprises:

acquiring video voice of the video media stream;

determining a first time in the video voice relating to the item to be sold for the first time and a second time relating to the item to be sold for the last time;

11. The method of claim 10, wherein the determining a first time in the video speech that relates to the item for sale for a first time and a second time in the video speech that relates to the item for sale for a last time comprises:

determining the voice text according to the video voice, and analyzing the semantic meaning of the voice text;

determining a first time to first occurrence of semantics associated with the item to be sold;

determining a second time of last occurrence of semantics associated with the item to be sold.

12. The method of claim 10, wherein the video media stream comprises at least one item sales scenario, each item sales scenario being for introducing an item to be sold;

wherein the determining a first time in the video voice relating to the item for sale for a first time and a second time relating to the item for sale for a last time comprises:

acquiring preset starting semantics and preset ending semantics of a corresponding article sales scene;

when the analyzed first semantic meaning is matched with the preset starting semantic meaning, the time corresponding to the analyzed first semantic meaning is used as the first time related to the article to be sold for the first time;

and after the first time is determined, when the analyzed second semantic meaning is matched with the preset finishing semantic meaning, taking the time corresponding to the analyzed second semantic meaning as the second time related to the article to be sold at the last time.

13. The method of claim 8, wherein said slicing the video media stream according to the time the item to be sold appeared in the video media stream comprises:

acquiring a video image and video voice of the video media stream;

determining a first image time of the video image at which the item to be sold appears for the first time and a second image time of the video image at which the item to be sold appears for the last time;

determining a first voice time of the video voice relating to the item to be sold for the first time and a second voice time of the video voice relating to the item to be sold for the last time;

selecting a time point in front from the first image time and the first voice time as a first time;

selecting a later time point from the second image time and the second voice time as a second time;

14. The method according to any one of claims 9-13, further comprising:

determining first fault-tolerant time according to preset fault-tolerant time and the first time;

determining second fault-tolerant time according to the preset fault-tolerant time and the second time;

and cutting the video media stream according to the first fault-tolerant time and the second fault-tolerant time.

15. The method of claim 1, further comprising:

and regularly sending the updated watching prompt information of the slice media stream to the equipment which has received the video media stream.

16. The method of claim 2, wherein the video media stream is a live video stream and the sliced media stream is a recorded video stream;

receiving an acquisition request of the video media stream, and sending viewing prompt information of the video media stream and the slice media stream, including:

and sending the current live video stream and the selected watching prompt information of the recorded and played video stream.

17. The method of claim 1, further comprising:

and correspondingly storing the watching prompt information and the identification of the video media stream so as to send the watching prompt information when sending the video media stream.

18. A method for processing media information, comprising:

sending an acquisition request of a video media stream to a server;

receiving the video media stream sent by the server and the watching prompt information of the slice media stream corresponding to at least one article to be sold selected from the video media stream;

and displaying the video media stream and the watching prompt information.

19. The method of claim 18, further comprising:

responding to the trigger operation of the watching prompt information, and sending an acquisition request of the slice media stream to the server;

and receiving the corresponding slice media stream sent by the server, and displaying the corresponding slice media stream.

20. A system for processing media information, comprising: a first terminal and a server;

the server receives a video media stream and identifies at least one article to be sold in the video media stream; selecting a slice media stream corresponding to the at least one article to be sold from the video media stream; generating viewing prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream;

the first terminal sends an acquisition request of the video media stream to a server;

and displaying the video media stream and the watching prompt information.

21. The system of claim 20, wherein the system further comprises a second terminal;

and the second client sends the video media stream to the server.

22. An apparatus for processing media information, comprising:

the identification module is used for receiving a video media stream and identifying at least one article to be sold in the video media stream;

the selection module is used for selecting the slice media stream corresponding to the at least one article to be sold from the video media stream;

and the generating module is used for generating the watching prompt information of the sliced media stream according to the at least one article to be sold and the corresponding sliced media stream.

23. A server comprising a memory, a processor, and a communication component;

the memory for storing a computer program;

the communication component for receiving a video media stream;

the processor to execute the computer program to:

identifying at least one item for sale in the video media stream;

24. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by one or more processors, causes the one or more processors to perform the steps of the method of any one of claims 1-17.

25. An apparatus for processing media information, comprising:

the sending module is used for sending an acquisition request of the video media stream to the server;

the receiving module is used for receiving the video media stream sent by the server and the watching prompt information of the slice media stream corresponding to at least one article to be sold selected from the video media stream;

and the display module is used for displaying the video media stream and the watching prompt information.

26. A device comprising a memory, a processor, and a communication component;

the memory for storing a computer program;

the communication component is used for sending an acquisition request of the video media stream to the server;

the processor to execute the computer program to:

and displaying the video media stream and the watching prompt information.

27. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by one or more processors, causes the one or more processors to perform the steps of the method of any one of claims 18-19.