CN117336520A

CN117336520A - Live broadcast information processing method and processing device based on intelligent digital person

Info

Publication number: CN117336520A
Application number: CN202311632685.7A
Authority: CN
Inventors: 陈达剑; 李火亮
Original assignee: Tuoshe Technology Group Co ltd; Jiangxi Tuoshi Intelligent Technology Co ltd
Current assignee: Tuoshe Technology Group Co ltd; Jiangxi Tuoshi Intelligent Technology Co ltd
Priority date: 2023-12-01
Filing date: 2023-12-01
Publication date: 2024-01-02
Anticipated expiration: 2043-12-01
Also published as: CN117336520B

Abstract

The application provides a processing method and a processing device for live broadcast information based on intelligent digital people, wherein the processing method comprises the following steps: obtaining a spot request from an electronic device, including a reference spot text; determining the play type of each inserted sentence in the reference inserted text; acquiring a live script; determining the statement type of the inter-cut statement according to the live scenario; determining whether the current rendering speed of the server for rendering the digital person is smaller than a preset speed; if the speed is smaller than the preset speed, determining a target text sequence according to the playing type; generating an inter-cut video corresponding to the target inter-cut sentence according to the sentence type of the target inter-cut sentence, wherein the target inter-cut sentence is the inter-cut sentence which is ordered to be the forefront in the target text sequence and does not generate the corresponding inter-cut video; and sending the spot video to the electronic equipment. According to the live broadcast information processing method and device, live broadcast events can be processed in time when accidents occur in a live broadcast room through processing live broadcast information, and comprehensiveness, intelligence and fluency of live broadcast information processing of a live broadcast information system are improved.

Description

Live broadcast information processing method and processing device based on intelligent digital person

Technical Field

The application relates to the field of video live broadcast application in image communication, in particular to a processing method and a processing device of live broadcast information based on intelligent digital people.

Background

With the development of internet technology, more and more live rooms begin to appear as digital people's statues. Compared with a live host, the digital host can realize all-weather live broadcast, so that the labor cost and the time cost are effectively reduced, and the live broadcast efficiency is improved.

However, current digital live broadcasting is not mature in information processing systems. Only one intelligent digital person is arranged in the traditional live broadcast information processing system, although the cost is greatly reduced, when accidental situations occur in live broadcast, the intelligent digital person needs to be trained on relevant contents in advance, so that a large amount of time is consumed, intelligent processing cannot be performed aiming at different live broadcast situation problems, and the comprehensiveness, intelligence and fluency of live broadcast information processing of the live broadcast information system are reduced.

Disclosure of Invention

The application provides a processing method and a processing device for live broadcast information based on intelligent digital persons, so that the situation that when an accident occurs in a live broadcast room, a digital person is in a poor flexibility state, a large amount of time is consumed, the problem that live broadcast accidents cannot be processed in time is solved intelligently, and comprehensiveness, intelligence and fluency of live broadcast information processing of the live broadcast information system are improved.

In a first aspect, an embodiment of the present application provides a method for processing live broadcast information based on an intelligent digital person, which is applied to a digital person live broadcast system, where the digital person live broadcast system includes a server and an electronic device, and the server is in communication connection with the electronic device, and the method includes:

obtaining a spot-cast request from the electronic equipment, wherein the spot-cast request comprises a reference spot-cast text, the spot-cast request is used for indicating an emergency in a target digital person live broadcasting room, and the reference spot-cast text comprises at least one spot-cast sentence;

determining the play type of each sentence of the inter-cut sentence in the reference inter-cut text, wherein the play type comprises fixed time interval play or random time interval play;

acquiring a live script of a target digital person corresponding to the target digital person live broadcasting room;

determining statement types of the insert statements in the reference insert text according to the live scenario, wherein the statement types are used for indicating the matching degree of the insert statements and the live statements in the live scenario;

determining whether the current rendering speed of the server for rendering the digital person is smaller than a preset speed;

if the speed is smaller than the preset speed, determining a target text sequence according to the play type, wherein the target text sequence is used for indicating the play sequence of the break-in sentence;

Generating an inter-cut video corresponding to a target inter-cut sentence according to the sentence type of the target inter-cut sentence, wherein the target inter-cut sentence is the inter-cut sentence which is ordered to be the forefront in the target text sequence and does not generate the corresponding inter-cut video;

and sending the inter-cut video to the electronic equipment, wherein the inter-cut video is used for being played in the target digital person live broadcasting room so as to cope with the emergency.

In a second aspect, an embodiment of the present application provides a processing apparatus for live broadcast information based on an intelligent digital person, which is applied to a digital person live broadcast system, where the digital person live broadcast system includes a server and an electronic device, and the server is in communication connection with the electronic device, and the apparatus includes:

the first acquisition unit is used for acquiring a spot-inserting request from the electronic equipment, wherein the spot-inserting request comprises a reference spot-inserting text, the spot-inserting request is used for indicating an emergency to occur in a target digital person live broadcasting room, and the reference spot-inserting text comprises at least one spot-inserting sentence;

the first determining unit is used for determining the play type of each sentence of the inter-cut sentence in the reference inter-cut text, wherein the play type comprises fixed time interval play or random time interval play;

The second acquisition unit is used for acquiring the live script of the target digital person corresponding to the target digital person live broadcasting room;

the second determining unit is used for determining statement types of the insert statements in the reference insert text according to the live scenario, wherein the statement types are used for indicating the matching degree of the insert statements and the live statements in the live scenario;

a third determining unit, configured to determine whether a rendering speed of the server when rendering the digital person currently is less than a preset speed;

a fourth determining unit, configured to determine a target text sequence according to the play type if a rendering speed of the server when rendering the digital person is less than the preset speed, where the target text sequence is used to indicate a play sequence of the break sentence;

the generation unit is used for generating the inter-cut video corresponding to the target inter-cut sentence according to the sentence type of the target inter-cut sentence, wherein the target inter-cut sentence is the inter-cut sentence which is ranked at the forefront in the target text sequence and does not generate the corresponding inter-cut video;

and the sending unit is used for sending the inter-cut video to the electronic equipment, wherein the inter-cut video is used for being played in the target digital person live broadcasting room so as to cope with the emergency.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the one or more programs comprising instructions for performing the steps of the methods described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon computer program instructions that are executed by a processor to implement the steps of the above-described method.

It can be seen that, in this example, firstly, a break-in request from an electronic device is obtained, where the break-in request includes a reference break-in text, and the break-in request is used to indicate that an emergency occurs in a target digital person live broadcast room, and the reference break-in text includes at least one break-in sentence; secondly, determining the play type of each sentence of the inter-cut sentence in the reference inter-cut text, wherein the play type comprises fixed time interval play or random time interval play; then acquiring a live script of a target digital person corresponding to the target digital person live broadcasting room; then determining statement types of the insert statements in the reference insert text according to the live scenario, wherein the statement types are used for indicating the matching degree of the insert statements and live statements in the live scenario; then determining whether the current rendering speed of the server for rendering the digital person is smaller than a preset speed; if the speed is smaller than the preset speed, determining a target text sequence according to the playing type, wherein the target text sequence is used for indicating the playing sequence of the inserting sentences; then generating a corresponding inter-cut video of the target inter-cut sentence according to the sentence type of the target inter-cut sentence, wherein the target inter-cut sentence is the inter-cut sentence which is ordered to be the forefront in the target text sequence and does not generate the corresponding inter-cut video; and finally, sending the inter-cut video to the electronic equipment, wherein the inter-cut video is used for being played in a target digital person live broadcasting room so as to cope with the emergency.

When an emergency occurs, namely, when an inter-cut request sent by electronic equipment is obtained, the method determines the play type and the sentence type of each inter-cut sentence of the inter-cut text in the inter-cut request, simultaneously obtains the live script, then determines whether the server is busy according to the rendering speed when the server renders digital people currently, determines the play sequence of the inter-cut sentences according to the play type under the condition that the server is busy, generates inter-cut videos according to the sentence types of target inter-cut sentences according to the play sequence, and finally sends the inter-cut videos to the electronic equipment, so that the server can effectively reduce the burden of the server under the condition that the server is busy, quickly generate the required inter-cut videos, and intelligently solve the problem of timely processing live accidents, and improves the comprehensiveness, the intelligence and the fluency of live information processing of a live information system.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a digital live system according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 3 is a flow chart of a method for processing live broadcast information based on intelligent digital people according to an embodiment of the present application;

fig. 4 is a schematic diagram of a display interface of a display screen of an electronic device according to an embodiment of the present application;

fig. 5 is a schematic view of a display interface of another display screen of an electronic device according to an embodiment of the present application;

fig. 6 is a functional unit composition block diagram of a processing device based on live broadcast information of an intelligent digital person according to an embodiment of the present application;

fig. 7 is a functional unit composition block diagram of another processing device based on live broadcast information of an intelligent digital person according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Only one intelligent digital person is arranged in the traditional live broadcast information processing system, although the cost is greatly reduced, when accidental situations occur in live broadcast, the intelligent digital person needs to be trained on relevant contents in advance, so that a large amount of time is consumed, intelligent processing cannot be performed aiming at different live broadcast situation problems, and the comprehensiveness, intelligence and fluency of live broadcast information processing of the live broadcast information system are reduced.

In view of the foregoing, an embodiment of the present application provides a method and an apparatus for processing live broadcast information based on an intelligent digital person, and the embodiment of the present application is described in detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic diagram of a digital live broadcast system according to an embodiment of the present application. As shown in fig. 1, the digital live system 10 includes a server 101 and an electronic device 102, where the server 101 is communicatively connected to the electronic device 102, and is configured to process a break-in text to obtain a break-in video when obtaining a break-in request from the electronic device. In particular, the digital personal living broadcast system may also include virtual character models, speech synthesis and speech recognition, virtual environments, and living broadcast platforms. In particular, the server 101 may include a cloud server, a dedicated server, and a GPU server.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 102 may be a computing-capable electronic device, and the electronic device 102 may include various other processing devices, as well as various forms of personal computers (personal computer), servers, network devices (networkdevices), and the like. The electronic device 102 comprises a processor 201, a memory 202, a communication interface 203, and one or more programs 204, the one or more programs 204 being stored in the memory 202 and configured to be executed by the processor 201, the one or more programs 204 comprising instructions for performing any of the following. In a specific implementation, the processor 201 is configured to perform any step performed by the electronic device in the method embodiment described below, and when performing data transmission such as sending, the communication interface 203 is optionally invoked to complete a corresponding operation.

Referring to fig. 3, fig. 3 is a flow chart of a method for processing live broadcast information based on intelligent digital people according to an embodiment of the present application. As shown in fig. 3, the method includes the following steps.

Step 301, obtaining a break-in request from the electronic device, where the break-in request includes a reference break-in text, where the break-in request is used to indicate that an emergency occurs in a target digital person live broadcast room, and the reference break-in text includes at least one break-in sentence.

The emergency event may include that the commodity link cannot be clicked to order, a flaring statement occurs, and the number of commodities put on shelf is wrong. The content of the break-in text is the content written by the user according to the current emergency, and the content can be derived from sentences trained by the digital anchor, sentences in the live broadcast process, sentences learned from other live broadcast videos and the like.

Step S302, determining a play type of each break sentence in the reference break text, where the play type includes fixed period play or random period play.

The insert sentence played in a fixed period may represent that the playing time of the insert sentence is fixed, and the insert sentence played in any period may represent that the insert sentence is played first or later, for example, the reference insert text includes four insert sentences A1, A2, A3, and A4, where A1 is the insert sentence that is played first, A4 is the insert sentence that is played last, and A2 and A3 are sentences played in any period, that is, play A2 first or play A3 first.

Step S303, acquiring a live script of the target digital person corresponding to the target digital person live broadcasting room.

Wherein, the live scenario can be the live content text of the current digital human live studio.

Step S304, determining statement types of the insert statements in the reference insert text according to the live scenario, wherein the statement types are used for indicating the matching degree of the insert statements and the live statements in the live scenario.

The statement type of the inter-cut statement can indicate the coincidence degree and the similarity of the inter-cut statement and the live statement in the live script, when the coincidence degree or the similarity of the inter-cut statement and the live statement is high, partial content can be replaced, and when the coincidence degree or the similarity of the inter-cut statement and the live statement is low, the digital anchor can be trained.

Step S305, determining whether the current rendering speed of the server for rendering the digital person is less than a preset speed.

When the current rendering speed of the server to the digital person is smaller than the preset speed, the server is not busy currently; and when the current rendering speed of the server for rendering the digital person is equal to or greater than the preset speed, indicating that the server is currently busy. According to the scheme, different treatments are carried out on live accidents according to the condition that the server is busy or not busy.

And step S306, if the speed is smaller than the preset speed, determining a target text sequence according to the play type, wherein the target text sequence is used for indicating the play sequence of the break-in sentence.

Under the condition that the server is busy, the inserted text is ordered, for example, the reference inserted text comprises four inserted sentences A5, A6, A7 and A8, wherein A5 is the inserted sentence which is played at the beginning, A8 is the inserted sentence which is played at the last, A6 and A7 are sentences which are played at any time interval, namely A6 or A7 can be played first, and the sequence of the four inserted sentences in the reference inserted text is A5, A6, A7 and A8. The order of the insert sentence A6 and the insert sentence A7 may be A6, A7 according to the original order of the insert sentence in the reference insert text, or may be ordered according to the sentence length value of the insert sentence, for example, the sentence length value of the insert sentence A6 is greater than the sentence length value of the insert sentence A7, and then the order of the insert sentence A6 and the insert sentence A7 may be A6, A7 from long to short, or A7, A6 from short to long. The order in which the text is played with reference to this may be A5, A6, A7, A8 or A5, A7, A6, A8.

Step S307, generating the inter-cut video corresponding to the target inter-cut sentence according to the sentence type of the target inter-cut sentence, wherein the target inter-cut sentence is the inter-cut sentence which is ordered to be the forefront in the target text sequence and does not generate the corresponding inter-cut video.

Step S308, sending the inter-cut video to the electronic device, where the inter-cut video is used to play in the target digital person live broadcast room to cope with the emergency.

After the inter-cut video is sent to the electronic device, the electronic device plays the inter-cut video to cope with the emergency. For example, when a live room is currently unable to click on an order link, the content of the break text may be to inform the user that the link will be re-uploaded, to issue a coupon, and to inform the user that the link has been re-uploaded. And processing the inter-cut sentences according to whether the server is busy or not and the similarity of the inter-cut sentences and the live broadcast sentences, and generating corresponding inter-cut video, wherein the inter-cut video content is that a user is informed to upload links again firstly, then a coupon is issued once, and finally the user is informed that the links are uploaded again, and the background of the inter-cut video is the same as that of a live broadcast room.

After the electronic device sends the inter-cut request, the user needs to judge the emergency to determine the inter-cut text, the server needs to generate the inter-cut video, and the user can determine whether the inter-cut request is processed correctly or not through playing of the inter-cut video. For example, as shown in fig. 4, fig. 4 is a schematic diagram of a display interface of a display screen of an electronic device according to an embodiment of the present application, and in fig. 4, a problem that a digital anchor is replying to comment B1 appears in an comment area, and more problems about failing to place a single X2 link appear in the comment area. In this case, referring to fig. 5, fig. 5 is a schematic view of a display interface of another display screen of an electronic device according to an embodiment of the present application. In fig. 5, after the current digital anchor replies to the problem of comment B1, the electronic device plays the generated break video, the break video does not change the background of the live video, and the play content is the break text after processing.

It can be seen that, in the embodiment of the present application, firstly, a break-in request from an electronic device is obtained, where the break-in request includes a reference break-in text, and the break-in request is used to indicate that an emergency occurs in a target digital person live broadcasting room, and the reference break-in text includes at least one break-in sentence; secondly, determining the play type of each sentence of the inter-cut sentence in the reference inter-cut text, wherein the play type comprises fixed time interval play or random time interval play; then acquiring a live script of a target digital person corresponding to the target digital person live broadcasting room; then determining statement types of the insert statements in the reference insert text according to the live scenario, wherein the statement types are used for indicating the matching degree of the insert statements and live statements in the live scenario; then determining whether the current rendering speed of the server for rendering the digital person is smaller than a preset speed; if the speed is smaller than the preset speed, determining a target text sequence according to the playing type, wherein the target text sequence is used for indicating the playing sequence of the inserting sentences; then generating a corresponding inter-cut video of the target inter-cut sentence according to the sentence type of the target inter-cut sentence, wherein the target inter-cut sentence is the inter-cut sentence which is ordered to be the forefront in the target text sequence and does not generate the corresponding inter-cut video; and finally, sending the inter-cut video to the electronic equipment, wherein the inter-cut video is used for being played in a target digital person live broadcasting room so as to cope with the emergency.

In one possible embodiment, determining the target text sequence according to the play type includes the steps of: generating an initial inter-cut text according to the play time interval of the fixed play sentence, wherein the fixed play sentence is an inter-cut sentence of which the play type is fixed time interval play in the reference inter-cut text, and the arrangement sequence of the inter-cut sentence in the initial inter-cut text is matched with the play time interval of the fixed play sentence; sorting statement length values of the randomly played statements to obtain a sorting result, wherein the randomly played statements are inserted statements with play types of random time intervals in the reference inserted text; inserting the random play sentence into the initial break text according to the sequencing result to obtain a target break text; and determining the arrangement sequence of the inter-cut sentences in the target inter-cut text as the target text sequence.

The method comprises the steps that a target text sequence is determined according to the play type of the inserted sentence, and the inserted sentence with the play type being played in a fixed period is directly sequenced to obtain an initial inserted text; for the inserted sentences with the play type being played in any period, determining the length corresponding to each inserted sentence, determining the sequencing result according to the length, and finally inserting each inserted sentence into the initial inserted text in sequence to obtain the target inserted text. For example, when the current product is brand Z, but a large number of people in the review area consider false, the digital presenter's spot text is a story to placate the comment emotion, explain that the product's brand is brand Z, and to tell brand Z. Obviously, the digital presenter needs to pacify the emotion of the discussion area first, so the initial spot text can be pacifying the emotion of the discussion area, explaining that the brand of the product is brand Z and telling the created story of brand Z, or pacifying the emotion of the discussion area, telling the created story of brand Z and explaining that the brand of the product is brand Z. And after the emotion of the comment area is plagued, since the comment area needs to be clearly told in time, the establishment story of the telling brand needs to be executed first and the brand of the product needs to be interpreted as the one with less content in the brand Z, so that the target spot text is determined.

Therefore, in the embodiment of the application, the order of the target text needs to be determined according to the play type of the inter-cut sentence, so as to obtain the target inter-cut text, and the user can process the accident according to the target inter-cut text.

In one possible embodiment, the sentence types include a first type of insert sentence or a second type of insert sentence, the matching degree between the first type of insert sentence and any one of the live sentences in the live script is greater than or equal to a first preset value, the second type of insert sentence is an insert sentence except the first type of insert sentence in the reference insert text, and generating an insert video corresponding to the target insert sentence according to the sentence types of the target insert sentence includes the following steps: when the statement type of the target inter-cut statement is a first type inter-cut statement, acquiring a live broadcast source video of a target digital person living broadcast room, wherein the live broadcast source video is a pre-recorded digital person video for living broadcast; generating a spot video corresponding to the target spot sentence according to the live broadcast source video; and when the statement type of the target inter-cut statement is the second-type inter-cut statement, generating an inter-cut video corresponding to the target inter-cut statement according to the target inter-cut text.

The statement type of the target inter-cut statement is that the first class inter-cut statement indicates that the coincidence degree or similarity between the target inter-cut statement and one live broadcast statement in the live broadcast script is high, and the corresponding inter-cut video is required to be obtained by processing according to the live broadcast source video; if the statement type of the target inter-cut statement is that the second inter-cut statement indicates that the coincidence degree or similarity between the target inter-cut statement and one live broadcast statement in the live broadcast script is low, the inter-cut statement is processed to obtain a corresponding inter-cut video.

Therefore, in the embodiment of the application, the corresponding inter-cut video is obtained according to the statement type of the target inter-cut statement, so that the user can play the inter-cut video to process the sudden accident.

In one possible embodiment, generating a break video corresponding to a target break sentence according to a live source video includes the following steps: acquiring a target live broadcast statement, wherein the target live broadcast statement is a live broadcast statement in the live broadcast scenario, and the matching degree of the target live broadcast statement and the target insert statement is smaller than or equal to the first preset value; determining a difference word with a difference between the target spot broadcast statement and the target live broadcast statement; determining target homophones from the live script according to the difference words, wherein the target homophones and the difference words homophones; intercepting a first direct-broadcasting video corresponding to a target direct-broadcasting statement from a direct-broadcasting source video; intercepting a second live video corresponding to the target homophone from the live source video; and generating the inter-cut video corresponding to the target inter-cut statement according to the first direct-cast video and the second direct-cast video.

The matching degree of the target live broadcast statement and the target inter-cut statement is smaller than or equal to a first preset value, which indicates that the similarity of the target live broadcast statement and the target inter-cut statement is high, but part of the content is different. And determining target homophones with the same pronunciation from the live script based on the partial content, respectively performing editing and other operations on the live source video according to the target insert sentence and the target homophones to obtain a first live video and a second live video, and replacing the partial content of the first live video with the second live video in a editing and other modes. For example, the target spot broadcast statement may be "issue one hundred coupons", and there may be one statement of the target spot broadcast statement in the live scenario that "issue five hundred coupons", and "one" is a difference word that the target spot broadcast statement has a difference from the target spot broadcast statement, at this time, it may be browsed whether "one" word or homophones thereof appear in the live scenario, the live video of the pronunciation may be intercepted to obtain a second live video, the live video of the target spot broadcast statement may be intercepted to obtain a first live video, and the video segment of "five" in the first live video may be replaced by a second live video by clipping or other means.

Therefore, in the embodiment of the application, when the coincidence ratio or similarity between the target spot cast statement and one of the live broadcast statements in the live broadcast script is high, operations such as editing and the like can be performed on the live broadcast source video according to the difference characters between the live broadcast script and the spot cast statement, only the difference character parts are modified, the digital person is not required to be retrained, the live broadcast accident is processed in time, the processing speed is improved, and the burden of a server is effectively reduced.

In one possible embodiment, determining the target homophone from the live scenario based on the difference word includes the steps of: determining whether reference characters homophonic with the difference words exist in a preset statement range in the live script, wherein the play time interval between the live statement in the preset statement range and the target live statement in the live source video is smaller than a first preset value; if so, acquiring a reference live broadcast sentence corresponding to the reference text when the reference text is a plurality of; determining a reference character corresponding to a reference live broadcast sentence with highest matching degree of the target inter-cut sentence as a target homonym; when the reference characters are one, determining that the reference characters are the target homophones; if the target homophone word does not exist, determining the reference word closest to the playing time of the target live sentence in the live source video as the target homophone word.

The preset sentence range can be set by a user, and the preset sentence range can be an upper sentence and a lower sentence of the target live sentence, or can be an upper sentence and a lower sentence of the target live sentence. Similarly, the first preset value may be set by the user, and the first preset value may be 5s, 10s, or 20s. And for the selection of the target homophones, taking the homophone with the highest matching degree as the target homophone when a plurality of homophones exist, directly taking the homophone as the target homophone when one homophone exists, and taking the corresponding homophone with the playing time closest to that of the target insert sentence as the target homophone when the homophone does not exist.

Because the digital anchor video playing is continuous, generally, the background of adjacent playing time will not change basically, and the homophone with the closest time is selected as the target homophone in the embodiment of the present application to avoid the background from being different. If the determined live video background corresponding to the homophone word is different from the live video background corresponding to the target live sentence, the live video can be directly replaced if the background difference is not large, if the background difference is large and the difference word is small, the live video can be replaced by audio without replacing the picture of the live source video, and although the mouth shape is not uniform, the live video is not easy to find because only individual words are replaced, the processing efficiency is improved, and the generation of the inter-cut video is quickened.

Therefore, the method for determining the background is the same as that of the method for preferentially selecting the live video with the closest time, because the background of the live video with the general adjacent time cannot change, if the background changes, direct replacement or audio replacement is selected according to the difference, the playing speed is accelerated, timely and intelligent live accident processing by a digital anchor can be facilitated, and the comprehensiveness, intelligence and fluency of live information processing by a live information system are improved.

In one possible embodiment, generating, according to the target spot text, a spot video corresponding to the target spot sentence, including the following steps: determining whether the statement length value of the target spot statement is larger than a second preset value; if the statement length value is larger than the second preset value, dividing the target insert statement into a plurality of statement fragments, wherein the statement length value of each statement fragment in the plurality of statement fragments is smaller than the second preset value; rendering the target digital person based on each sentence fragment in turn, and obtaining the inter-cut video corresponding to each sentence fragment in turn; and if the target digital person is smaller than or equal to the second preset value, rendering the target digital person based on the target inter-cut sentence to obtain the inter-cut video corresponding to the target inter-cut sentence.

The second preset value may be set by the user, and the second preset value may be 5 words, 10 words, or 20 words. When the statement length value of the target spot statement is larger than a second preset value, the target spot statement is required to be divided to obtain a plurality of statement fragments, and then the digital anchor is rendered based on the statement fragments to obtain the spot video. For example, the target spot sentence is "about to issue five hundred coupons", the target spot sentence is divided into five segments of "about to", "issue", "five hundred coupons", "one coupon", and digital anchor is rendered based on the five segments in sequence to obtain corresponding spot videos, so that the corresponding spot videos can be played in time in the live broadcasting room, and the waiting time is avoided being too long. When the statement length value of the target inter-cut statement is smaller than or equal to a second preset value, rendering is directly carried out on the digital anchor based on the target inter-cut statement, operation is simplified, all target inter-cut statements are prevented from being divided, processing speed is improved, and generation of inter-cut video is quickened.

Therefore, in the embodiment of the application, under the condition that the coincidence degree or similarity of the target insert sentence and one of the live broadcast sentences in the live broadcast scenario is low, different rendering training is carried out on the digital anchor according to the sentence length value of the insert sentence, so that a more suitable processing mode of the live broadcast room exists under various conditions, the problem of live broadcast accidents can be intelligently processed, and the comprehensiveness, the intelligence and the fluency of live broadcast information processing of the live broadcast information system are improved.

In one possible embodiment, before determining whether the statement length value of the target spot statement is greater than the second preset value, the method further includes the following steps: determining whether alternative sentences exist in the target inter-cut text, wherein the alternative sentences are inter-cut sentences of which the playing types are played in any time period and the sentence types are inter-cut sentences of the first type of inter-cut sentences; if the alternative sentences exist, changing the sequence of the target text, so that the alternative sentences are arranged before the target inter-cut sentences, and determining the alternative sentences as the target inter-cut sentences; and generating the inter-cut video corresponding to the target inter-cut statement according to the live broadcast source video.

When there are a plurality of candidate sentences, the target text sequence may be changed according to the matching degree of each candidate sentence with the live sentence of the live script, for example, the candidate sentence with the highest matching degree with the live sentence of the live script may be used as the target insert sentence. The target text order may also be changed according to the playing time of each alternative sentence, for example, the alternative sentence closest to the playing time of the current target spot sentence may be taken as the target spot sentence. When an alternative sentence exists, the alternative sentence is directly used as a target insert sentence.

When no alternative sentence exists, rendering training can be performed on the digital anchor according to the sentence length value of the inserted sentence.

It can be seen that, in the embodiment of the present application, when the coincidence ratio or similarity between the target insert sentence and a live broadcast sentence in the live broadcast scenario is low, it needs to determine whether there is an alternative sentence in the target insert text, and when there is an alternative sentence, the sequence of the target text is changed, and when there is no alternative sentence, rendering training is performed on the digital anchor according to the sentence length value of the insert sentence.

In accordance with the foregoing embodiments, referring to fig. 6, fig. 6 is a functional unit block diagram of a processing device based on live broadcast information of an intelligent digital person according to an embodiment of the present application. The processing device 50 based on the live broadcast information of the intelligent digital person is applied to a digital person live broadcast system, the digital person live broadcast system comprises a server and an electronic device, the server is in communication connection with the electronic device, and the processing device 50 based on the live broadcast information of the intelligent digital person comprises: a first obtaining unit 501, configured to obtain a break-in request from the electronic device, where the break-in request includes a reference break-in text, where the break-in request is used to indicate that an emergency occurs in a target digital person live broadcasting room, and the reference break-in text includes at least one break-in sentence; a first determining unit 502, configured to determine a play type of each break sentence in the reference break text, where the play type includes fixed period play or random period play; a second obtaining unit 503, configured to obtain a live scenario of a target digital person corresponding to the target digital person live broadcasting room; a second determining unit 504, configured to determine, according to the live scenario, a statement type of an insert statement in the reference insert text, where the statement type is used to indicate a matching degree of the insert statement and a live statement in the live scenario; a third determining unit 505, configured to determine whether a rendering speed when the server currently renders the digital person is less than a preset speed; a fourth determining unit 506, configured to determine a target text sequence according to the play type if the current rendering speed of the server for rendering the digital person is less than the preset speed, where the target text sequence is used to indicate the play sequence of the break sentence; a generating unit 507, configured to generate, according to a sentence type of a target inter-cut sentence, an inter-cut video corresponding to the target inter-cut sentence, where the target inter-cut sentence is an inter-cut sentence that is ranked in the target text sequence and is the most forward and does not generate a corresponding inter-cut video; and the sending unit 508 is configured to send the inter-cut video to the electronic device, where the inter-cut video is used to play in the target digital person live broadcasting room to cope with the emergency.

In a possible embodiment, in the aspect of determining the target text sequence according to the play type, the first determining unit 502 is specifically configured to: generating an initial inter-cut text according to the play period of the fixed play sentence, wherein the fixed play sentence is an inter-cut sentence with the play type of the fixed period play in the reference inter-cut text, and the arrangement sequence of the inter-cut sentence in the initial inter-cut text is matched with the play period of the fixed play sentence; sorting statement length values of randomly played statements to obtain sorting results, wherein the randomly played statements are inserted statements with play types being played in any time period in the reference inserted text; inserting the random play sentence into the initial break text according to the sorting result to obtain a target break text; and determining the arrangement sequence of the inter-cut sentences in the target inter-cut text as the target text sequence.

In a possible embodiment, the sentence types include a first type of insert sentence or a second type of insert sentence, a matching degree between the first type of insert sentence and any one of the live broadcast sentences in the live broadcast scenario is greater than or equal to a first preset value, the second type of insert sentence is an insert sentence in the reference insert text, except for the first type of insert sentence, in the aspect of generating an insert video corresponding to the target insert sentence according to the sentence type of the target insert sentence, the generating unit 507 is specifically configured to: when the statement type of the target inter-cut statement is the first inter-cut statement, acquiring a live broadcast source video of the target digital person living broadcasting room, wherein the live broadcast source video is a prerecorded digital person video for living broadcast; generating a spot video corresponding to the target spot sentence according to the live broadcast source video; and when the statement type of the target inter-cut statement is the second-type inter-cut statement, generating an inter-cut video corresponding to the target inter-cut statement according to the target inter-cut text.

In one possible embodiment, in the aspect of generating the break video corresponding to the target break sentence according to the live source video, the generating unit 507 is specifically configured to: acquiring a target live broadcast statement, wherein the target live broadcast statement is a live broadcast statement in the live broadcast scenario, and the matching degree of the target live broadcast statement and the target inter-cut statement is smaller than or equal to the first preset value; determining a difference word of the difference between the target spot broadcast statement and the target live broadcast statement; determining a target homophone word from the live script according to the difference word, wherein the target homophone word and the difference word homophone; intercepting a first direct-broadcasting video corresponding to the target direct-broadcasting statement from the direct-broadcasting source video; intercepting a second live video corresponding to the target homophone from the live source video; and generating the inter-cut video corresponding to the target inter-cut statement according to the first direct-cast video and the second direct-cast video.

In a possible embodiment, in said determining a target homophone from said live scenario according to said difference word, said generating unit 507 is specifically configured to: determining whether reference characters homophonic with the difference words exist in a preset statement range in the live script, wherein the play time interval between the live statement in the preset statement range and the target live statement in the live source video is smaller than a first preset value; if so, acquiring a reference live broadcast sentence corresponding to the reference text when the reference text is a plurality of; determining a reference character corresponding to the reference live broadcast sentence with the highest matching degree of the target spot broadcast sentence as the target homonym; when the reference character is one, determining that the reference character is the target homophone character; and if the target homophone word does not exist, determining the reference word closest to the playing time of the target live broadcast sentence in the live broadcast source video as the target homophone word.

In one possible embodiment, in the aspect of generating, according to the target spot text, a spot video corresponding to the target spot sentence, the generating unit 507 is specifically configured to: determining whether the statement length value of the target spot statement is larger than a second preset value; if the statement length value is larger than the second preset value, dividing the target inserting statement into a plurality of statement fragments, wherein the statement length value of each statement fragment in the plurality of statement fragments is smaller than the second preset value; rendering the target digital person based on each statement fragment in turn to obtain an inter-cut video corresponding to each statement fragment in turn; and if the target digital person is smaller than or equal to the second preset value, rendering the target digital person based on the target inter-cut sentence to obtain an inter-cut video corresponding to the target inter-cut sentence.

In a possible embodiment, before the determining whether the statement length value of the target spot statement is greater than the second preset value, the generating unit 507 is specifically configured to: determining whether an alternative sentence exists in the target inter-cut text, wherein the alternative sentence is in the target inter-cut text, the play type is played in any time period, and the sentence type is an inter-cut sentence of the first type inter-cut sentence; if the alternative sentences exist, changing the target text sequence, enabling the alternative sentences to be arranged before the target inter-cut sentences, and determining the alternative sentences to be the target inter-cut sentences; and generating the inter-cut video corresponding to the target inter-cut statement according to the live broadcast source video.

It can be understood that, since the method embodiment and the apparatus embodiment are in different presentation forms of the same technical concept, the content of the method embodiment portion in the present application should be adapted to the apparatus embodiment portion synchronously, which is not described herein.

In the case of using integrated units, please refer to fig. 7, fig. 7 is a block diagram of functional units of another processing device based on live broadcast information of intelligent digital people according to an embodiment of the present application. As shown in fig. 7, the smart digital person-based live information processing apparatus 50 includes: a processing module 512 and a communication module 511. The processing module 512 is configured to perform control management based on actions of the processing means of live information of the intelligent digital person, for example, performing the first acquisition unit 501, the first determination unit 502, the second acquisition unit 503, the second determination unit 504, the third determination unit 505, the fourth determination unit 506, the generation unit 507 and the transmission unit 508 steps, and/or other processes for performing the techniques described herein. The communication module 511 is used for interaction between the processing apparatus based on live information of the smart digital person and other devices. As shown in fig. 7, the smart digital person-based live information processing apparatus may further include a storage module 513, and the storage module 513 may be configured to store program codes and data of the smart digital person-based live information processing apparatus.

The processing module 512 may be a processor or controller, such as a central processing unit (Central Processing Unit, CPU), a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module 511 may be a transceiver, an RF circuit, a communication interface, or the like. The memory module 513 may be a memory.

All relevant contents of each scenario related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein. The processing device 50 for live broadcast information based on intelligent digital person may perform the processing method for live broadcast information based on intelligent digital person shown in fig. 3.

The foregoing description of the embodiments of the present application has been presented primarily in terms of a method-side implementation. It will be appreciated that, in order to achieve the above-described functions, the electronic device includes a hardware structure and a software module for performing the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application may divide the functional units of the electronic device according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.

Embodiments of the present application also provide a computer-readable storage medium storing computer program instructions that cause a computer to perform some or all of the steps of any of the methods described in the method embodiments above, the computer including an electronic device.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Although the present application is disclosed above, the present application is not limited thereto. Variations and modifications may be readily apparent to those skilled in the art without departing from the spirit and scope of the present application, and it is intended that all such variations and modifications include the combination of the various functions and steps of the above-described embodiments, including software and hardware, be within the scope of the present application.

Claims

1. The method is characterized by being applied to a digital person live broadcast system, wherein the digital person live broadcast system comprises a server and electronic equipment, and the server is in communication connection with the electronic equipment, and the method comprises the following steps:

2. The method of claim 1, wherein said determining a target text order based on said play type comprises:

generating an initial inter-cut text according to the play period of the fixed play sentence, wherein the fixed play sentence is an inter-cut sentence with the play type of the fixed period play in the reference inter-cut text, and the arrangement sequence of the inter-cut sentence in the initial inter-cut text is matched with the play period of the fixed play sentence;

sorting statement length values of randomly played statements to obtain sorting results, wherein the randomly played statements are inserted statements with play types being played in any time period in the reference inserted text;

inserting the random play sentence into the initial break text according to the sorting result to obtain a target break text;

And determining the arrangement sequence of the inter-cut sentences in the target inter-cut text as the target text sequence.

3. The method of claim 2, wherein the sentence types include a first type of insert sentence or a second type of insert sentence, a matching degree between the first type of insert sentence and any one of the live broadcast sentences in the live broadcast scenario is greater than or equal to a first preset value, the second type of insert sentence is an insert sentence except for the first type of insert sentence in the reference insert text, and the generating an insert video corresponding to the target insert sentence according to the sentence types of the target insert sentence includes:

when the statement type of the target inter-cut statement is the first inter-cut statement, acquiring a live broadcast source video of the target digital person living broadcasting room, wherein the live broadcast source video is a prerecorded digital person video for living broadcast; generating a spot video corresponding to the target spot sentence according to the live broadcast source video;

and when the statement type of the target inter-cut statement is the second-type inter-cut statement, generating an inter-cut video corresponding to the target inter-cut statement according to the target inter-cut text.

4. The method of claim 3, wherein the generating the break-in video corresponding to the target break-in statement from the live source video comprises:

Acquiring a target live broadcast statement, wherein the target live broadcast statement is a live broadcast statement in the live broadcast scenario, and the matching degree of the target live broadcast statement and the target inter-cut statement is smaller than or equal to the first preset value;

determining a difference word of the difference between the target spot broadcast statement and the target live broadcast statement;

determining a target homophone word from the live script according to the difference word, wherein the target homophone word and the difference word homophone;

intercepting a first direct-broadcasting video corresponding to the target direct-broadcasting statement from the direct-broadcasting source video;

intercepting a second live video corresponding to the target homophone from the live source video;

and generating the inter-cut video corresponding to the target inter-cut statement according to the first direct-cast video and the second direct-cast video.

5. The method of claim 4, wherein the determining the target homophone from the live scenario based on the difference word comprises:

determining whether reference characters homophonic with the difference words exist in a preset statement range in the live script, wherein the play time interval between the live statement in the preset statement range and the target live statement in the live source video is smaller than a first preset value;

If so, acquiring a reference live broadcast sentence corresponding to the reference text when the reference text is a plurality of; determining a reference character corresponding to the reference live broadcast sentence with the highest matching degree of the target spot broadcast sentence as the target homonym; when the reference character is one, determining that the reference character is the target homophone character;

and if the target homophone word does not exist, determining the reference word closest to the playing time of the target live broadcast sentence in the live broadcast source video as the target homophone word.

6. The method of claim 3, wherein the generating the inter-cut video corresponding to the target inter-cut sentence according to the target inter-cut text comprises:

determining whether the statement length value of the target spot statement is larger than a second preset value;

if the statement length value is larger than the second preset value, dividing the target inserting statement into a plurality of statement fragments, wherein the statement length value of each statement fragment in the plurality of statement fragments is smaller than the second preset value; rendering the target digital person based on each statement fragment in turn to obtain an inter-cut video corresponding to each statement fragment in turn;

And if the target digital person is smaller than or equal to the second preset value, rendering the target digital person based on the target inter-cut sentence to obtain an inter-cut video corresponding to the target inter-cut sentence.

7. The method of claim 6, wherein prior to determining whether the statement length value of the target spot statement is greater than a second preset value, the method further comprises:

determining whether an alternative sentence exists in the target inter-cut text, wherein the alternative sentence is in the target inter-cut text, the play type is played in any time period, and the sentence type is an inter-cut sentence of the first type inter-cut sentence;

if the alternative sentences exist, changing the target text sequence, enabling the alternative sentences to be arranged before the target inter-cut sentences, and determining the alternative sentences to be the target inter-cut sentences;

and generating the inter-cut video corresponding to the target inter-cut statement according to the live broadcast source video.

8. The utility model provides a processing apparatus based on live broadcast information of intelligent digital people, its characterized in that is applied to digital people live broadcast system, digital people live broadcast system includes server and electronic equipment, server and electronic equipment communication connection, the device includes:

9. An electronic device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the one or more programs comprising instructions for performing the steps in the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon computer program instructions, which are executed by a processor to implement the steps of the method of any of claims 1-7.