CN108337559A

CN108337559A - A kind of live streaming word methods of exhibiting and system

Info

Publication number: CN108337559A
Application number: CN201810118450.9A
Authority: CN
Inventors: 唐招兵
Original assignee: Hangzhou Zheng Xin Jin Clothes Internet Technology Co Ltd
Current assignee: Hangzhou Zheng Xin Jin Clothes Internet Technology Co Ltd
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2018-07-27

Abstract

The invention discloses a kind of live streaming word methods of exhibiting and systems, its key points of the technical solution are that including the following steps：Obtain video stream data, wherein image information and audio-frequency information are carried in video stream data；Based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information is generated in real time；Image information in the video stream data is sent to be shown in the first display area established in user terminal and be sent to text information in the second display area for establishing in user terminal and carries out segmentation scrolling display.Meet user and watching the audio-frequency information in capable of following live streaming when live streaming, sees the text information subtitle of audio-frequency information in real time.

Description

A kind of live streaming word methods of exhibiting and system

Technical field

The present invention relates to live streaming word display technique, more particularly to a kind of live streaming word methods of exhibiting and system.

Background technology

The advantage of internet is drawn and continued to network direct broadcasting, is broadcast live on the net in the way of video signal, can incite somebody to action The content sites publications such as product introduction, related meeting, background introduction, scheme test and appraisal, investigation on the net, dialogue interview, online training Onto internet, using internet it is intuitive, quick, performance situation it is good, abundant in content, interactivity is strong, region is unrestricted, by The features such as crowd can divide reinforces the promotion effect of site of activity.

At this stage, banking network live streaming is quickly grown, by the mode of financial meeting live broadcast exhibition, can allow the public more The economic form of solution at this stage, but Financial organization term invite welcome guest mouth in occur, and invite welcome guest oral account it is very fast when, If not enough understanding financial industry, will lead to not learn that the word face of Financial organization term is looked like from invitation welcome guest mouthful, to Influence viewing experience of the user for live streaming.

Invention content

In view of the deficiencies of the prior art, the present invention intends to provide a kind of live streaming word methods of exhibiting, have The characteristics of improving user live broadcast viewing experience.

The present invention above-mentioned technical purpose technical scheme is that：

A kind of live streaming word methods of exhibiting, includes the following steps：

Obtain video stream data, wherein image information and audio-frequency information are carried in video stream data；

Based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information is generated in real time；

Image information in the video stream data is sent in the first display area established in user terminal shown, with And text information is sent in the second display area established in user terminal and carries out segmentation scrolling display.

Preferably, based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information, packet are generated in real time Include following steps：

Based on the text information generated in real time, several word sections in scan text information；

Judge that the word section in text information is indexed with the presence or absence of entry one by one；

If so, being associated the entry index of this article field with this article field to form link characters section.

Preferably, there are entry indexes for the word section in text information, by entry index and the word of this article field Further include following steps after section is associated to form link characters section：

Judge whether the link characters section in text information is triggered；

If so, the definition of head-word during link characters section conjunctive word item is indexed is sent to the third display area of user terminal foundation In shown.

It is corresponding with audio-frequency information to generate in real time by speech recognition audio information based on the audio-frequency information in video stream data Text information.

Preferably, based on the audio-frequency information in video stream data, by speech recognition audio information to generate in real time and sound The corresponding text information of frequency information, includes the following steps：

Establish personnel's tamber characteristic database, and by the personnel's tamber characteristic and personal information phase in personnel's tamber characteristic database Association；

According to personnel's tamber characteristic in speech recognition audio information, while generating text information corresponding with audio-frequency information in real time By the addition of corresponding personal information in text information.

Preferably, according to personnel's tamber characteristic in speech recognition audio information, text corresponding with audio-frequency information is generated in real time Further include following steps while word information by the addition of corresponding personal information in text information：

Judge whether the interval duration generated between two neighboring word section in text information is more than preset duration；

If it is not, then further judging whether the tamber characteristic between previous word section and the latter word section changes；

If so, current character information is separated on the basis of the changed word section of tamber characteristic leading portion text information and Back segment text information, and corresponding personal information is added respectively in leading portion text information and back segment text information.

Preferably, the image information in the video stream data is being sent to the first display area established in user terminal It is middle shown and be sent to text information segmentation scrolling display is carried out in the second display area for being established in user terminal Afterwards, further include following steps：

Based on the text information of generation to form text information set；

Obtain the corrigendum text information that authorized user end is sent；

The text information in text information set is scanned one by one based on corrigendum text information, by corrigendum text information Chinese The number of field is compared with the number of text information Chinese Fields, judges the matching number for correcting text information Chinese Fields Whether it is more than preset matching number；

If so, corrigendum text information is replaced the text information and is stored in text information set.

In view of the deficienciess of the prior art, it is another object of the present invention to provide a kind of live streaming word displaying systems System has the characteristics that improve user live broadcast viewing experience.

A kind of live streaming word display systems, including：

Acquisition module, for obtaining video stream data, wherein image information and audio-frequency information are carried in video stream data；

Generation module generates text information corresponding with audio-frequency information in real time based on the audio-frequency information in video stream data；

Sending module, for the image information in the video stream data to be sent to the first display area established in user terminal It is middle shown and be sent to text information in the second display area for being established in user terminal carry out segmentation scrolling display.

In view of the deficienciess of the prior art, it is another object of the present invention to provide a kind of server, has and improve The characteristics of user live broadcast viewing experience.

A kind of server, including：

At least one processor；

At least one processor being connect with the processor communication, wherein the memory is stored with can be by the processor The instruction set of execution, the processor call described instruction collection to be able to carry out live streaming word methods of exhibiting described above.

In view of the deficienciess of the prior art, it is another object of the present invention to provide a kind of non-transient readable storages Device has the characteristics that improve user live broadcast viewing experience.

A kind of non-transient readable memory, the non-transient readable memory are stored with instruction set, and described instruction collection is suitable for place Reason device loads and executes live streaming word methods of exhibiting as described above.

In conclusion the present invention having the beneficial effect that in contrast to the prior art：

1, meet user and watching the audio-frequency information in capable of following live streaming when live streaming, see the text information of audio-frequency information in real time Subtitle；

2, when the progress for following live streaming sees text information subtitle, if occurring occurring the standard words of industry class in text information It converges, can check that this article field checks the definition of head-word by clicking, reinforce the understanding of user；

3, the audio-frequency information in this live video flow data is combined with text information, with the playing progress rate of video stream data, Text information can carry out correspondingly segmentation roller automatically according to the tamber characteristic in audio-frequency information and show.

Description of the drawings

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other features of the invention, Objects and advantages will become more apparent upon：

Fig. 1 is the flow diagram that word methods of exhibiting is broadcast live in technical solution of the present invention；

Fig. 2 is the flow diagram that link characters section generates in technical solution of the present invention；

Fig. 3 is the flow diagram that the definition of head-word is sent in technical solution of the present invention；

Fig. 4 is the flow diagram that tamber characteristic identifies in technical solution of the present invention；

Fig. 5 is the flow diagram of word information segmenting in technical solution of the present invention；

Fig. 6 is the flow diagram that corrigendum text information is replaced in technical solution of the present invention；

Fig. 7 is the system block diagram that word display systems are broadcast live in technical solution of the present invention；

Fig. 8 is the structural schematic diagram of server in technical solution of the present invention.

Specific implementation mode

In order to preferably technical scheme of the present invention be made clearly to show, the present invention is made into one below in conjunction with the accompanying drawings Walk explanation.

Shown in referring to Fig.1, Fig. 1 is a kind of flow signal for live streaming word methods of exhibiting that technical solution of the present invention provides Figure, is described in detail the live streaming word methods of exhibiting in the embodiment of the present invention from server side below in conjunction with attached drawing 1.It should Method may comprise steps of S100 ~ step S300.

Step S100 obtains video stream data, wherein image information and audio-frequency information are carried in video stream data.

Specifically, server can obtain video stream data from the video that collection in worksite camera is shot, wherein video Image information and audio-frequency information are carried in flow data.For this purpose, server can be according to the video stream data, by the video fluxion It is played out according to user terminal is issued to, so that user terminal obtains the audio-frequency information of the image information and scene at scene.

Step S200 generates text information corresponding with audio-frequency information in real time based on the audio-frequency information in video stream data.

It specifically, will be to the audio in video stream data while video stream data is issued to user terminal by server Information extracts, and generates text information corresponding with audio-frequency information in real time.

In one embodiment, server can pass through speech recognition audio based on the audio-frequency information in video stream data Information to generate text information corresponding with audio-frequency information in real time.Wherein, audio-frequency information acquires typing by the microphone at scene It is formed, therefore can directly be acquired by server and obtain and be identified by speech recognition technology, believed with audio to generate The corresponding text information of language content in breath.

In another embodiment, staff can be used according in the audio-frequency information listened in the mode that text information generates Voice content carries out manual code word and the mode of typing generates.

It is worth noting that text information generation synchronous with audio-frequency information and synchronizing and being issued in user terminal.

Image information in the video stream data is sent to the first display area established in user terminal by step S300 It is middle shown and be sent to text information in the second display area for being established in user terminal carry out segmentation scrolling display.

Specifically, being established in user terminal has the first display area and the second display area, server will be under video stream data When being sent in user terminal, video stream data sound intermediate frequency information is played out by the loud speaker of user terminal；In video stream data Image information will show that text information will be segmented scrolling display in the second display area in the first display area.Specifically, Voice content in audio-frequency information includes the language of the mankind, and the voice content in audio-frequency information is multistage, as a result, in every section of voice Container has corresponding text information.As a result, text information by the second display area with audio-frequency information in voice content It interrupts and interrupts, current character information shows top set, to form the pattern of segmentation scrolling display.Wherein, text information passes through Cumulative mode is shown that user by page turning or can pull the form of scroll bar to check history in the second display area Text information.

Thus meet user and watching the audio-frequency information in capable of following live streaming when live streaming, seen in real time based on audio-frequency information Text information subtitle.

Fig. 2 is the flow diagram that link characters section generates in technical solution of the present invention, based on the sound in video stream data Frequency information generates text information corresponding with audio-frequency information, includes the following steps in real time：

Step S201, based on the text information generated in real time, several word sections in scan text information；

Step S202 judges that the word section in text information is indexed with the presence or absence of entry one by one；

Step S203, if so, being associated the entry index of this article field with this article field to form link characters section.

According to technical solution defined by step S201 ~ step S203, specifically, there are several words in text information Section, Duan Weiyi word of word or single word.Several word sections in scan text information are judged that word is believed by server one by one Word section in breath is indexed with the presence or absence of entry.In the present embodiment, being established in the database of server has entry base, entry base In typing in advance have common entry, entry index is established with this, wherein carry the entry of the corresponding entry in common entry Paraphrase.

As a result, with reference to shown in Fig. 3, there are entry indexes for the word section in text information, by the entry rope of this article field Draw after being associated with this article field to form link characters section, further includes following steps：

S204, judges whether the link characters section in text information is triggered；

S205, if so, the third that the definition of head-word during link characters section conjunctive word item is indexed is sent to user terminal foundation is shown Show in region and is shown.

According to technical solution defined by step S204 ~ step S205, specifically, when link characters section is triggered by user When, user terminal asks transmission data into server, which can be associated with by server according to the request of data The definition of head-word be sent to user terminal, user can the third display area in user terminal to the definition of head-word of this article field into Row is checked, user is facilitated to understand the content of this article field.It is worth noting that the definition of head-word in third display area will not be with It the propulsion of audio-frequency information and text information and disappears, the definition of head-word in third display area only can be by new definition of head-word institute It substitutes.

With reference to shown in Fig. 4, Fig. 4 is the flow diagram that tamber characteristic identifies in technical solution of the present invention, below in conjunction with Attached drawing 4 is described in detail the tamber characteristic identification in the embodiment of the present invention from server side.This method may include following Step S210 ~ step S211.

Step S210 establishes personnel's tamber characteristic database, and personnel's tone color in personnel's tamber characteristic database is special Sign associated with personal information

Specifically, the live streaming in the present embodiment is mainly used in interview live streaming, and there are the sound between multiple personnel in interview live streaming Sound interaction has different tamber characteristics between different personnel as a result,.In the present embodiment, personnel's tamber characteristic database In be stored with the tamber characteristic of the personnel of being invited to, which will be associated with personal information, specifically, personal information Name including personnel.

Step S211 is generated corresponding with audio-frequency information in real time according to personnel's tamber characteristic in speech recognition audio information While text information by the addition of corresponding personal information in text information.

Specifically, server can identify audio according to the audio-frequency information in video stream data by speech recognition technology In information while language content, personnel's tamber characteristic in audio-frequency information can be accordingly identified.With interview section in the present embodiment It is illustrated for mesh, there are following three personnel：Host A, welcome guest B and welcome guest C, server receive video stream data, The audio-frequency information in video stream data is identified by speech recognition technology, obtains voice contents of the host A in audio-frequency information Form corresponding text information, wherein server accordingly identifies the tamber characteristic in audio-frequency information, transfers personnel's tamber characteristic number It is compared with the tamber characteristic according to personnel's tamber characteristic in library, the personal information of host A is obtained, by host A's Name fills into, is added in the beginning of text information.Server can be corresponding according to the audio-frequency information in video stream data as a result, Obtain welcome guest B, the voice content of welcome guest C forms corresponding text information, and according to the tamber characteristic of welcome guest B and welcome guest C, obtain The name of welcome guest B and welcome guest C are filled into, are added to the text information of welcome guest B and welcome guest B by the personal information of welcome guest B and welcome guest C Beginning.And according to above-mentioned technical proposal, the text information that host A, welcome guest B and welcome guest C are formed scrolls segmentation, The text information corresponding to host A is shown in the second display area of user terminal, send out in welcome guest B when host A makes a speech The text information corresponding to welcome guest B is shown in the second display area of user terminal, be the of user terminal in welcome guest's C speeches when speech Two display areas show the text information corresponding to welcome guest C.

Fig. 5 is the flow diagram of word information segmenting in technical solution of the present invention, according in speech recognition audio information Personnel's tamber characteristic, in real time generate text information corresponding with audio-frequency information while by corresponding personal information addition in word Further include following steps in information：

Step S212 judges whether the interval duration generated between two neighboring word section in text information is more than preset duration；

Step S213, if it is not, then further judging whether the tamber characteristic between previous word section and the latter word section is sent out Changing；

Step S214, if so, current character information is separated into leading portion on the basis of the changed word section of tamber characteristic Text information and back segment text information, and corresponding personal information is added respectively in leading portion text information and back segment text information In.

It is specifically, two connected in text information when judging according to technical solution defined by step S212 ~ step S214 When the interval duration generated between word section is less than preset duration, weight occurs as in text information between two neighboring word section It is folded.In the present embodiment, illustrated by taking above-mentioned interview live streaming as an example.When two word sections comprising overlapping in text information, I.e. when welcome guest B makes a speech host A pull up a horse connected welcome guest B speech or welcome guest B word speed it is too fast, at this point, server according to Tamber characteristic in the audio-frequency information judges whether the tamber characteristic between previous word section and the latter word section becomes Change, if changing, the personal information addition of welcome guest B is added into leading portion text information, and the personnel of host A are believed Breath addition is added into back segment text information, and scrolling display is always segmented in the second display area to form two sections of text informations.

With reference to shown in Fig. 6, Fig. 6 is the flow diagram that corrigendum text information is replaced in technical solution of the present invention；Below will The corrigendum text information replacement in the embodiment of the present invention is described in detail from server side in conjunction with attached drawing 6.This method can be with Include the following steps S400 ~ step S700.

Image information in the video stream data is being sent to the first viewing area established in user terminal by step S300 It is shown and is sent to text information in the second display area for being established in user terminal in domain and carry out segmentation scrolling display Later, further include following steps：

Step S400, based on the text information of generation to form text information set；

Step S500 obtains the corrigendum text information that authorized user end is sent；

Step S600 is one by one scanned the text information in text information set based on corrigendum text information, will more text The number of word information Chinese Fields is compared with the number of text information Chinese Fields, judges to correct text information Chinese Fields Matching number whether be more than preset matching number；

Step S700, if so, corrigendum text information is replaced the text information and is stored in text information set.

It, specifically, will note when each text information generates according to technical solution defined by step S400 ~ step S700 The generated time of text information is recorded, the generated time and the playing progress rate of video stream data are mutually bound, and complete in live play The text information of Bi Hou, generation will form text information set, as video stream data preserves in the server.

Wherein, when text information set is generated by the speech recognition technology of server, there will be certain for text information Error, when server plays back the live video by user terminal as a result, the mistake of text information causes the experience sense of user poor. Server can obtain the corrigendum text information of authorized user end transmission as a result, and corrigendum text information is replaced the text information Be stored in text information set, when other users end plays and reviews the video stream data, correct text can be obtained Word information annotates.Wherein, authorized user end is the user terminal using administrative staff's account login service device.

As shown in fig. 7, Fig. 7 is the system block diagram that word display systems are broadcast live in technical solution of the present invention, the live streaming word Display systems include acquisition module, generation module and sending module.

Acquisition module is for obtaining video stream data, wherein image information and audio-frequency information are carried in video stream data；

Image information in the video stream data for being sent to the first display area established in user terminal by sending module It is middle shown and be sent to text information in the second display area for being established in user terminal carry out segmentation scrolling display.

Above-mentioned technical proposal, the present embodiment are no longer described in detail in herein.

Fig. 8 is the structural schematic diagram of server in technical solution of the present invention.With reference to server shown in Fig. 8, including processing Device, memory and bus.

Processor and memory are at least arranged one.Processor and memory complete mutual lead to by the bus Letter.Wherein, memory is stored with the instruction set that can be executed by the processor, and the processor calls described instruction collection that can hold The method that row above-described embodiment provides, such as including：

A kind of non-transient readable memory is present embodiments provided, non-transient readable memory is stored with instruction set, described The method that instruction set loads suitable for processor and executes above-described embodiment offer, such as including：

One of ordinary skill in the art will appreciate that：The whole or department's step for realizing above method embodiment can pass through Program instruction（Instruction set）Relevant hardware is completed, and program above-mentioned can be stored in readable access to memory, which exists When execution, step including the steps of the foregoing method embodiments is executed；And memory above-mentioned includes：ROM, RAM, magnetic disc or CD etc. The various media that can store program code.

The embodiments such as the server of live streaming word methods of exhibiting described above are only schematical, wherein the work The unit illustrated for separating component may or may not be physically separated, and the component shown as unit can be Or it may not be physical unit, you can be located at a place, or may be distributed over multiple network units.It can be with Some or all of module therein is selected according to the actual needs to achieve the purpose of the solution of this embodiment.The common skill in this field Art personnel are not in the case where paying performing creative labour, you can to understand and implement.

The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

1. a kind of live streaming word methods of exhibiting, which is characterized in that include the following steps：

2. a kind of live streaming word methods of exhibiting according to claim 1, which is characterized in that based on the sound in video stream data Frequency information generates text information corresponding with audio-frequency information, includes the following steps in real time：

3. a kind of live streaming word methods of exhibiting according to claim 2, which is characterized in that the word section in text information There are entry indexes also to wrap after being associated the entry index of this article field to form link characters section with this article field Include following steps：

Judge whether the link characters section in text information is triggered；

4. a kind of live streaming word methods of exhibiting according to claim 1, which is characterized in that based on the sound in video stream data Frequency information generates text information corresponding with audio-frequency information, includes the following steps in real time：

5. a kind of live streaming word methods of exhibiting according to claim 4, which is characterized in that based on the sound in video stream data Frequency information includes the following steps by speech recognition audio information to generate text information corresponding with audio-frequency information in real time：

6. a kind of live streaming word methods of exhibiting according to claim 5, which is characterized in that according to speech recognition audio information In personnel's tamber characteristic, in real time generate text information corresponding with audio-frequency information while by corresponding personal information addition in text Further include following steps in word information：

7. a kind of live streaming word methods of exhibiting according to claim 1, which is characterized in that will be in the video stream data Image information be sent to and shown in the first display area established in user terminal and text information is sent to user Further include following steps after carrying out segmentation scrolling display in the second display area established in end：

Based on the text information of generation to form text information set；

Obtain the corrigendum text information that authorized user end is sent；

8. a kind of live streaming word display systems, which is characterized in that including：

9. a kind of server, which is characterized in that including：

At least one processor；

At least one processor being connect with the processor communication, wherein the memory is stored with can be by the processor The instruction set of execution, the processor call described instruction collection to be able to carry out the live streaming as described in claim 1-7 any one Word methods of exhibiting.

10. a kind of non-transient readable memory, which is characterized in that the non-transient readable memory is stored with instruction set, described Instruction set loads suitable for processor and executes the live streaming word methods of exhibiting as described in claim 1-7 any one.