Invention content
In view of the deficiencies of the prior art, the present invention intends to provide a kind of live streaming word methods of exhibiting, have
The characteristics of improving user live broadcast viewing experience.
The present invention above-mentioned technical purpose technical scheme is that:
A kind of live streaming word methods of exhibiting, includes the following steps:
Obtain video stream data, wherein image information and audio-frequency information are carried in video stream data;
Based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information is generated in real time;
Image information in the video stream data is sent in the first display area established in user terminal shown, with
And text information is sent in the second display area established in user terminal and carries out segmentation scrolling display.
Preferably, based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information, packet are generated in real time
Include following steps:
Based on the text information generated in real time, several word sections in scan text information;
Judge that the word section in text information is indexed with the presence or absence of entry one by one;
If so, being associated the entry index of this article field with this article field to form link characters section.
Preferably, there are entry indexes for the word section in text information, by entry index and the word of this article field
Further include following steps after section is associated to form link characters section:
Judge whether the link characters section in text information is triggered;
If so, the definition of head-word during link characters section conjunctive word item is indexed is sent to the third display area of user terminal foundation
In shown.
Preferably, based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information, packet are generated in real time
Include following steps:
It is corresponding with audio-frequency information to generate in real time by speech recognition audio information based on the audio-frequency information in video stream data
Text information.
Preferably, based on the audio-frequency information in video stream data, by speech recognition audio information to generate in real time and sound
The corresponding text information of frequency information, includes the following steps:
Establish personnel's tamber characteristic database, and by the personnel's tamber characteristic and personal information phase in personnel's tamber characteristic database
Association;
According to personnel's tamber characteristic in speech recognition audio information, while generating text information corresponding with audio-frequency information in real time
By the addition of corresponding personal information in text information.
Preferably, according to personnel's tamber characteristic in speech recognition audio information, text corresponding with audio-frequency information is generated in real time
Further include following steps while word information by the addition of corresponding personal information in text information:
Judge whether the interval duration generated between two neighboring word section in text information is more than preset duration;
If it is not, then further judging whether the tamber characteristic between previous word section and the latter word section changes;
If so, current character information is separated on the basis of the changed word section of tamber characteristic leading portion text information and
Back segment text information, and corresponding personal information is added respectively in leading portion text information and back segment text information.
Preferably, the image information in the video stream data is being sent to the first display area established in user terminal
It is middle shown and be sent to text information segmentation scrolling display is carried out in the second display area for being established in user terminal
Afterwards, further include following steps:
Based on the text information of generation to form text information set;
Obtain the corrigendum text information that authorized user end is sent;
The text information in text information set is scanned one by one based on corrigendum text information, by corrigendum text information Chinese
The number of field is compared with the number of text information Chinese Fields, judges the matching number for correcting text information Chinese Fields
Whether it is more than preset matching number;
If so, corrigendum text information is replaced the text information and is stored in text information set.
In view of the deficienciess of the prior art, it is another object of the present invention to provide a kind of live streaming word displaying systems
System has the characteristics that improve user live broadcast viewing experience.
The present invention above-mentioned technical purpose technical scheme is that:
A kind of live streaming word display systems, including:
Acquisition module, for obtaining video stream data, wherein image information and audio-frequency information are carried in video stream data;
Generation module generates text information corresponding with audio-frequency information in real time based on the audio-frequency information in video stream data;
Sending module, for the image information in the video stream data to be sent to the first display area established in user terminal
It is middle shown and be sent to text information in the second display area for being established in user terminal carry out segmentation scrolling display.
In view of the deficienciess of the prior art, it is another object of the present invention to provide a kind of server, has and improve
The characteristics of user live broadcast viewing experience.
The present invention above-mentioned technical purpose technical scheme is that:
A kind of server, including:
At least one processor;
At least one processor being connect with the processor communication, wherein the memory is stored with can be by the processor
The instruction set of execution, the processor call described instruction collection to be able to carry out live streaming word methods of exhibiting described above.
In view of the deficienciess of the prior art, it is another object of the present invention to provide a kind of non-transient readable storages
Device has the characteristics that improve user live broadcast viewing experience.
The present invention above-mentioned technical purpose technical scheme is that:
A kind of non-transient readable memory, the non-transient readable memory are stored with instruction set, and described instruction collection is suitable for place
Reason device loads and executes live streaming word methods of exhibiting as described above.
In conclusion the present invention having the beneficial effect that in contrast to the prior art:
1, meet user and watching the audio-frequency information in capable of following live streaming when live streaming, see the text information of audio-frequency information in real time
Subtitle;
2, when the progress for following live streaming sees text information subtitle, if occurring occurring the standard words of industry class in text information
It converges, can check that this article field checks the definition of head-word by clicking, reinforce the understanding of user;
3, the audio-frequency information in this live video flow data is combined with text information, with the playing progress rate of video stream data,
Text information can carry out correspondingly segmentation roller automatically according to the tamber characteristic in audio-frequency information and show.
Specific implementation mode
In order to preferably technical scheme of the present invention be made clearly to show, the present invention is made into one below in conjunction with the accompanying drawings
Walk explanation.
Shown in referring to Fig.1, Fig. 1 is a kind of flow signal for live streaming word methods of exhibiting that technical solution of the present invention provides
Figure, is described in detail the live streaming word methods of exhibiting in the embodiment of the present invention from server side below in conjunction with attached drawing 1.It should
Method may comprise steps of S100 ~ step S300.
Step S100 obtains video stream data, wherein image information and audio-frequency information are carried in video stream data.
Specifically, server can obtain video stream data from the video that collection in worksite camera is shot, wherein video
Image information and audio-frequency information are carried in flow data.For this purpose, server can be according to the video stream data, by the video fluxion
It is played out according to user terminal is issued to, so that user terminal obtains the audio-frequency information of the image information and scene at scene.
Step S200 generates text information corresponding with audio-frequency information in real time based on the audio-frequency information in video stream data.
It specifically, will be to the audio in video stream data while video stream data is issued to user terminal by server
Information extracts, and generates text information corresponding with audio-frequency information in real time.
In one embodiment, server can pass through speech recognition audio based on the audio-frequency information in video stream data
Information to generate text information corresponding with audio-frequency information in real time.Wherein, audio-frequency information acquires typing by the microphone at scene
It is formed, therefore can directly be acquired by server and obtain and be identified by speech recognition technology, believed with audio to generate
The corresponding text information of language content in breath.
In another embodiment, staff can be used according in the audio-frequency information listened in the mode that text information generates
Voice content carries out manual code word and the mode of typing generates.
It is worth noting that text information generation synchronous with audio-frequency information and synchronizing and being issued in user terminal.
Image information in the video stream data is sent to the first display area established in user terminal by step S300
It is middle shown and be sent to text information in the second display area for being established in user terminal carry out segmentation scrolling display.
Specifically, being established in user terminal has the first display area and the second display area, server will be under video stream data
When being sent in user terminal, video stream data sound intermediate frequency information is played out by the loud speaker of user terminal;In video stream data
Image information will show that text information will be segmented scrolling display in the second display area in the first display area.Specifically,
Voice content in audio-frequency information includes the language of the mankind, and the voice content in audio-frequency information is multistage, as a result, in every section of voice
Container has corresponding text information.As a result, text information by the second display area with audio-frequency information in voice content
It interrupts and interrupts, current character information shows top set, to form the pattern of segmentation scrolling display.Wherein, text information passes through
Cumulative mode is shown that user by page turning or can pull the form of scroll bar to check history in the second display area
Text information.
Thus meet user and watching the audio-frequency information in capable of following live streaming when live streaming, seen in real time based on audio-frequency information
Text information subtitle.
Fig. 2 is the flow diagram that link characters section generates in technical solution of the present invention, based on the sound in video stream data
Frequency information generates text information corresponding with audio-frequency information, includes the following steps in real time:
Step S201, based on the text information generated in real time, several word sections in scan text information;
Step S202 judges that the word section in text information is indexed with the presence or absence of entry one by one;
Step S203, if so, being associated the entry index of this article field with this article field to form link characters section.
According to technical solution defined by step S201 ~ step S203, specifically, there are several words in text information
Section, Duan Weiyi word of word or single word.Several word sections in scan text information are judged that word is believed by server one by one
Word section in breath is indexed with the presence or absence of entry.In the present embodiment, being established in the database of server has entry base, entry base
In typing in advance have common entry, entry index is established with this, wherein carry the entry of the corresponding entry in common entry
Paraphrase.
As a result, with reference to shown in Fig. 3, there are entry indexes for the word section in text information, by the entry rope of this article field
Draw after being associated with this article field to form link characters section, further includes following steps:
S204, judges whether the link characters section in text information is triggered;
S205, if so, the third that the definition of head-word during link characters section conjunctive word item is indexed is sent to user terminal foundation is shown
Show in region and is shown.
According to technical solution defined by step S204 ~ step S205, specifically, when link characters section is triggered by user
When, user terminal asks transmission data into server, which can be associated with by server according to the request of data
The definition of head-word be sent to user terminal, user can the third display area in user terminal to the definition of head-word of this article field into
Row is checked, user is facilitated to understand the content of this article field.It is worth noting that the definition of head-word in third display area will not be with
It the propulsion of audio-frequency information and text information and disappears, the definition of head-word in third display area only can be by new definition of head-word institute
It substitutes.
With reference to shown in Fig. 4, Fig. 4 is the flow diagram that tamber characteristic identifies in technical solution of the present invention, below in conjunction with
Attached drawing 4 is described in detail the tamber characteristic identification in the embodiment of the present invention from server side.This method may include following
Step S210 ~ step S211.
Step S210 establishes personnel's tamber characteristic database, and personnel's tone color in personnel's tamber characteristic database is special
Sign associated with personal information
Specifically, the live streaming in the present embodiment is mainly used in interview live streaming, and there are the sound between multiple personnel in interview live streaming
Sound interaction has different tamber characteristics between different personnel as a result,.In the present embodiment, personnel's tamber characteristic database
In be stored with the tamber characteristic of the personnel of being invited to, which will be associated with personal information, specifically, personal information
Name including personnel.
Step S211 is generated corresponding with audio-frequency information in real time according to personnel's tamber characteristic in speech recognition audio information
While text information by the addition of corresponding personal information in text information.
Specifically, server can identify audio according to the audio-frequency information in video stream data by speech recognition technology
In information while language content, personnel's tamber characteristic in audio-frequency information can be accordingly identified.With interview section in the present embodiment
It is illustrated for mesh, there are following three personnel:Host A, welcome guest B and welcome guest C, server receive video stream data,
The audio-frequency information in video stream data is identified by speech recognition technology, obtains voice contents of the host A in audio-frequency information
Form corresponding text information, wherein server accordingly identifies the tamber characteristic in audio-frequency information, transfers personnel's tamber characteristic number
It is compared with the tamber characteristic according to personnel's tamber characteristic in library, the personal information of host A is obtained, by host A's
Name fills into, is added in the beginning of text information.Server can be corresponding according to the audio-frequency information in video stream data as a result,
Obtain welcome guest B, the voice content of welcome guest C forms corresponding text information, and according to the tamber characteristic of welcome guest B and welcome guest C, obtain
The name of welcome guest B and welcome guest C are filled into, are added to the text information of welcome guest B and welcome guest B by the personal information of welcome guest B and welcome guest C
Beginning.And according to above-mentioned technical proposal, the text information that host A, welcome guest B and welcome guest C are formed scrolls segmentation,
The text information corresponding to host A is shown in the second display area of user terminal, send out in welcome guest B when host A makes a speech
The text information corresponding to welcome guest B is shown in the second display area of user terminal, be the of user terminal in welcome guest's C speeches when speech
Two display areas show the text information corresponding to welcome guest C.
Fig. 5 is the flow diagram of word information segmenting in technical solution of the present invention, according in speech recognition audio information
Personnel's tamber characteristic, in real time generate text information corresponding with audio-frequency information while by corresponding personal information addition in word
Further include following steps in information:
Step S212 judges whether the interval duration generated between two neighboring word section in text information is more than preset duration;
Step S213, if it is not, then further judging whether the tamber characteristic between previous word section and the latter word section is sent out
Changing;
Step S214, if so, current character information is separated into leading portion on the basis of the changed word section of tamber characteristic
Text information and back segment text information, and corresponding personal information is added respectively in leading portion text information and back segment text information
In.
It is specifically, two connected in text information when judging according to technical solution defined by step S212 ~ step S214
When the interval duration generated between word section is less than preset duration, weight occurs as in text information between two neighboring word section
It is folded.In the present embodiment, illustrated by taking above-mentioned interview live streaming as an example.When two word sections comprising overlapping in text information,
I.e. when welcome guest B makes a speech host A pull up a horse connected welcome guest B speech or welcome guest B word speed it is too fast, at this point, server according to
Tamber characteristic in the audio-frequency information judges whether the tamber characteristic between previous word section and the latter word section becomes
Change, if changing, the personal information addition of welcome guest B is added into leading portion text information, and the personnel of host A are believed
Breath addition is added into back segment text information, and scrolling display is always segmented in the second display area to form two sections of text informations.
With reference to shown in Fig. 6, Fig. 6 is the flow diagram that corrigendum text information is replaced in technical solution of the present invention;Below will
The corrigendum text information replacement in the embodiment of the present invention is described in detail from server side in conjunction with attached drawing 6.This method can be with
Include the following steps S400 ~ step S700.
Image information in the video stream data is being sent to the first viewing area established in user terminal by step S300
It is shown and is sent to text information in the second display area for being established in user terminal in domain and carry out segmentation scrolling display
Later, further include following steps:
Step S400, based on the text information of generation to form text information set;
Step S500 obtains the corrigendum text information that authorized user end is sent;
Step S600 is one by one scanned the text information in text information set based on corrigendum text information, will more text
The number of word information Chinese Fields is compared with the number of text information Chinese Fields, judges to correct text information Chinese Fields
Matching number whether be more than preset matching number;
Step S700, if so, corrigendum text information is replaced the text information and is stored in text information set.
It, specifically, will note when each text information generates according to technical solution defined by step S400 ~ step S700
The generated time of text information is recorded, the generated time and the playing progress rate of video stream data are mutually bound, and complete in live play
The text information of Bi Hou, generation will form text information set, as video stream data preserves in the server.
Wherein, when text information set is generated by the speech recognition technology of server, there will be certain for text information
Error, when server plays back the live video by user terminal as a result, the mistake of text information causes the experience sense of user poor.
Server can obtain the corrigendum text information of authorized user end transmission as a result, and corrigendum text information is replaced the text information
Be stored in text information set, when other users end plays and reviews the video stream data, correct text can be obtained
Word information annotates.Wherein, authorized user end is the user terminal using administrative staff's account login service device.
As shown in fig. 7, Fig. 7 is the system block diagram that word display systems are broadcast live in technical solution of the present invention, the live streaming word
Display systems include acquisition module, generation module and sending module.
Acquisition module is for obtaining video stream data, wherein image information and audio-frequency information are carried in video stream data;
Generation module generates text information corresponding with audio-frequency information in real time based on the audio-frequency information in video stream data;
Image information in the video stream data for being sent to the first display area established in user terminal by sending module
It is middle shown and be sent to text information in the second display area for being established in user terminal carry out segmentation scrolling display.
Above-mentioned technical proposal, the present embodiment are no longer described in detail in herein.
Fig. 8 is the structural schematic diagram of server in technical solution of the present invention.With reference to server shown in Fig. 8, including processing
Device, memory and bus.
Processor and memory are at least arranged one.Processor and memory complete mutual lead to by the bus
Letter.Wherein, memory is stored with the instruction set that can be executed by the processor, and the processor calls described instruction collection that can hold
The method that row above-described embodiment provides, such as including:
Obtain video stream data, wherein image information and audio-frequency information are carried in video stream data;
Based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information is generated in real time;
Image information in the video stream data is sent in the first display area established in user terminal shown, with
And text information is sent in the second display area established in user terminal and carries out segmentation scrolling display.
A kind of non-transient readable memory is present embodiments provided, non-transient readable memory is stored with instruction set, described
The method that instruction set loads suitable for processor and executes above-described embodiment offer, such as including:
Obtain video stream data, wherein image information and audio-frequency information are carried in video stream data;
Based on the audio-frequency information in video stream data, text information corresponding with audio-frequency information is generated in real time;
Image information in the video stream data is sent in the first display area established in user terminal shown, with
And text information is sent in the second display area established in user terminal and carries out segmentation scrolling display.
One of ordinary skill in the art will appreciate that:The whole or department's step for realizing above method embodiment can pass through
Program instruction(Instruction set)Relevant hardware is completed, and program above-mentioned can be stored in readable access to memory, which exists
When execution, step including the steps of the foregoing method embodiments is executed;And memory above-mentioned includes:ROM, RAM, magnetic disc or CD etc.
The various media that can store program code.
The embodiments such as the server of live streaming word methods of exhibiting described above are only schematical, wherein the work
The unit illustrated for separating component may or may not be physically separated, and the component shown as unit can be
Or it may not be physical unit, you can be located at a place, or may be distributed over multiple network units.It can be with
Some or all of module therein is selected according to the actual needs to achieve the purpose of the solution of this embodiment.The common skill in this field
Art personnel are not in the case where paying performing creative labour, you can to understand and implement.
The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation
Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art
Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications
It should be regarded as protection scope of the present invention.