CN109600681A

CN109600681A - Caption presentation method, device, terminal and storage medium

Info

Publication number: CN109600681A
Application number: CN201811446704.6A
Authority: CN
Inventors: 李亚军; 宋连军
Original assignee: Nanchang And De Software Technology Co Ltd
Current assignee: Shanghai Meike Information Technology Co ltd
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2019-04-09
Anticipated expiration: 2038-11-29
Also published as: CN109600681B

Abstract

The embodiment of the invention discloses a kind of caption presentation method, device, terminal and storage mediums.This method comprises: obtain current flow medium buffer data, extract it is described it is data cached in voice data；Semantic parsing is carried out to the voice data, obtains parsing sentence, and extracts the characteristic information in the parsing sentence；The characteristic information is matched with pre-stored characteristics information, obtains target keyword.By obtaining the voice data in data cached, and the parsing sentence that semantic parsing obtains is carried out to voice data and extracts characteristic information, characteristic information is matched to obtain target keyword with pre-stored characteristics information, improve the accuracy of subtitle recognition, realize the real-time of Subtitle Demonstration, the form and permission limitation for getting rid of original Subtitle Demonstration, facilitate user.

Description

Caption presentation method, device, terminal and storage medium

Technical field

The present embodiments relate to Subtitle Demonstration technical field more particularly to a kind of caption presentation method, device, terminal and Storage medium.

Background technique

Currently, user, when watching audio-video document, general custom watches subtitle, subtitle (is referred to and is shown with written form The non-visual contents such as TV, film, dialogue in stage works, also refer to the text of films and television programs post-production.By program Voice content shows that the spectators that hearing can be helped weaker understand programme content in a manner of subtitle.Also, since many words are same Sound only could more understand programme content in conjunction with watching by caption character and audio.In addition, subtitle can be used for turning over Foreign language program is translated, allows the spectators for not understanding the foreign language, can hear the vocal cords of original work, while understanding programme content.

The method of Subtitle Demonstration is to treat playing stream media progress speech recognition in advance by playout software to translate to obtain at present Subtitle, in the broadcast interface that subtitle is corresponded to display with Streaming Media.But the accuracy of this Subtitle Demonstration and real-time are very big Playout software is depended in degree, and permission is arranged in many playout softwares, causes subtitle that cannot normally show, brings to user It is inconvenient.

Summary of the invention

The embodiment of the present invention provides a kind of caption presentation method, device, terminal and storage medium, is watched and being flowed with real-time display The subtitle of media improves the accuracy of Subtitle Demonstration, and can need customized display format according to user.

In a first aspect, the embodiment of the invention provides a kind of caption presentation methods, comprising:

Obtain current flow medium buffer data, extract it is described it is data cached in voice data；

Semantic parsing is carried out to the voice data, obtains parsing sentence, and extracts the feature letter in the parsing sentence Breath；

The characteristic information is matched with pre-stored characteristics information, obtains target keyword.

Second aspect, the embodiment of the invention also provides a kind of subtitling display equipments, comprising:

Speech data extraction module, for obtaining current flow medium buffer data, extract it is described it is data cached in voice Data；

Characteristic information extracting module obtains parsing sentence, and extract institute for carrying out semantic parsing to the voice data State the characteristic information in parsing sentence；

Matching module obtains target keyword for matching the characteristic information with pre-stored characteristics information.

The third aspect, the embodiment of the invention also provides a kind of terminal, which includes:

One or more processors；

Memory, for storing one or more programs, when one or more of programs are by one or more of places It manages device to execute, so that one or more of processors realize any caption presentation method in the embodiment of the present invention.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes any caption presentation method in the embodiment of the present invention when program is executed by processor.

The embodiment of the present invention carries out semantic parsing to voice data and obtains by obtaining the voice data in data cached Parsing sentence extract characteristic information, characteristic information is matched to obtain target keyword with pre-stored characteristics information, is improved The accuracy of subtitle recognition realizes the real-time of Subtitle Demonstration, gets rid of the form and permission limitation of original Subtitle Demonstration, side User.

Detailed description of the invention

Fig. 1 is the flow chart of one of the embodiment of the present invention one caption presentation method；

Fig. 2 is the flow chart of one of the embodiment of the present invention two caption presentation method；

Fig. 3 is the structural schematic diagram of one of the embodiment of the present invention three subtitling display equipment；

Fig. 4 is the structural schematic diagram of one of the embodiment of the present invention four terminal.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the flow chart of one of the embodiment of the present invention one caption presentation method.Subtitle provided in this embodiment is aobvious Show the case where method is applicable to the Streaming Medias such as viewing video, can be executed by subtitling display equipment, which can be by soft The mode of part and/or hardware realizes that the device can integrate in the terminal.Referring to Fig. 1, the method for this implementation specifically includes as follows Step:

S110, current flow medium buffer data are obtained, extract it is described it is data cached in voice data.

Specifically, the Streaming Media is the continuous time-base media using streaming technology, such as: audio, video or more matchmakers Body file, when user opens Streaming Media, Streaming Media can be cached first, then be played out, and playback progress is less than caching Progress.At this point, terminal obtain it is data cached in data cached, to be further processed.

S120, semantic parsing is carried out to the voice data, obtains parsing sentence, and extract the spy in the parsing sentence Reference breath.

Specifically, terminal carries out semantic parsing to the voice data got, parsing sentence is obtained, the parsing sentence is The text sentence identified by speech analysis.Text sentence is decomposed, and carries out part of speech analysis, obtains each word Part of speech.The word of different parts of speech is screened, function word, such as adverbial word, preposition, conjunction, auxiliary word etc. are removed, retains notional word, such as Noun, verb, adjective etc..Using the notional word of reservation as the characteristic information of parsing sentence.

S130, the characteristic information is matched with pre-stored characteristics information, obtains target keyword.

Specifically, pre-saving vocabulary database in terminal, pre-stored characteristics information is preserved in database, it is described to prestore spy Reference breath includes notional word.Terminal matches the characteristic information for parsing sentence to guarantee characteristic information with pre-stored characteristics information Accuracy, using the characteristic information of successful match as target keyword.

Optionally, the characteristic information is matched with pre-stored characteristics information, obtains target keyword, comprising: obtained The pre-stored characteristics information within the scope of default progress in pre-stored characteristics information bank；Establish the characteristic information and the pre-stored characteristics Mapping relations between information；The characteristic information is matched with the pre-stored characteristics information according to the mapping relations. Illustratively, it according to fixed stream media caption, determines pre-stored characteristics information, is stored in pre-stored characteristics information bank.From prestoring spy The pre-stored characteristics information that default progress range is obtained in information bank is levied, the default progress range can be by technical staff according to need It is set, is closed with guaranteeing that the characteristic information of default characteristic information and parsing sentence within the scope of default progress has mapping System.The calculating that the characteristic information of default characteristic information and parsing sentence is mapped, illustratively, according to training sample training Relevance model carries out mapping relations calculating to characteristic information according to mapping model.By parsing sentence characteristic information and prestore spy Reference breath is matched, and synonym is obtained, and synonym and pre-stored characteristics information is carried out mapping relations calculating, and reflect what is obtained It penetrates result to compare with default characteristic information and parsing sentence characteristic information mapping result, synonym and pre-stored characteristics information When mapping result value is more than or equal to the mapping result value of parsing result and pre-stored characteristics information, using synonym as target critical Word.

The technical solution of the present embodiment, obtains current flow medium buffer data, extract it is described it is data cached in voice number According to；Semantic parsing is carried out to the voice data, obtains parsing sentence, and extracts the characteristic information in the parsing sentence；It will The characteristic information is matched with pre-stored characteristics information, obtains target keyword.By obtaining the voice number in data cached According to, and the parsing sentence that semantic parsing obtains is carried out to voice data and extracts characteristic information, characteristic information and pre-stored characteristics are believed Breath is matched to obtain target keyword, is improved the accuracy of subtitle recognition, is realized the real-time of Subtitle Demonstration, get rid of The form and permission of original Subtitle Demonstration limit, and facilitate user.

Embodiment two

Fig. 2 is the flow chart of one of the embodiment of the present invention two caption presentation method.The present embodiment is in above-described embodiment On the basis of be optimized, the details not being described in detail in the present embodiment is detailed in above-described embodiment.Referring to fig. 2, this implementation Example provide caption presentation method include:

Cache-time corresponding to S210, detection current cache region.

Specifically, the accuracy in order to guarantee caption, terminal needs wherein one section of audio-video to entire Streaming Media Data are handled to obtain caption information, therefore terminal detects the buffer zone of current Streaming Media, and it is right to obtain buffer zone institute The cache-time answered.By the judgement to current cache time, the size of current cache data can be accurately known, thus accurately It must obtain and can satisfy sentence length, and can be realized the data cached of maximum treatment efficiency.

S220, when the cache-time meets preset threshold, read the data cached of the buffer area.

Judge cache-time corresponding to buffer zone, when cache-time meets preset threshold, reading obtains buffer area Domain it is data cached.Preset threshold can be set as needed by technical staff, to guarantee the voice sentence in data cached While sub- integrity degree, the efficiency of processing is improved.In addition, obtaining data cached by terminal in the present embodiment and carrying out subtitle and turn over It translates and shows, get rid of the permission limitation of streaming media playing software, terminal can be made to preset streaming media according to user Automatic intelligent translation display is carried out, the viewing of user is facilitated, there is versatility.

S230, current flow medium buffer data are obtained, extract it is described it is data cached in voice data.

S240, semantic parsing is carried out to the voice data, obtains parsing sentence, and extract the spy in the parsing sentence Reference breath.

S250, the characteristic information is matched with pre-stored characteristics information, obtains target keyword.

S260, part of speech analysis is carried out to the target keyword, and is based on the analysis results stored in the target keyword In pre-stored characteristics information bank.

Specifically, the accuracy in order to guarantee characteristic information library, be used to calculate with parsing it is semantic in characteristic information reflect Relationship is penetrated, guarantees that target keyword meets current context, characteristic information library is updated.Illustratively, by judging To after target keyword, part of speech analysis is carried out to characteristic information, is classified based on the analysis results to target keyword.According to mesh The classification results for marking keyword, target keyword is deposited into pre-stored characteristics information bank.

The technical solution of the present embodiment detects cache-time corresponding to current cache region；When the cache-time is full When sufficient preset threshold, the data cached of the buffer area is read；Current flow medium buffer data are obtained, are extracted described data cached In voice data；Semantic parsing is carried out to the voice data, obtains parsing sentence, and extracts the spy in the parsing sentence Reference breath；The characteristic information is matched with pre-stored characteristics information, obtains target keyword；To the target keyword into The analysis of row part of speech, and the target keyword is stored in pre-stored characteristics information bank based on the analysis results.Number is cached by obtaining Voice data in, and the parsing sentence that semantic parsing obtains is carried out to voice data and extracts characteristic information, by characteristic information It is matched to obtain target keyword with pre-stored characteristics information, and target keyword deposit is prestored into spy based on the analysis results It levies in information bank.The accuracy for improving subtitle recognition realizes the real-time of Subtitle Demonstration, gets rid of original Subtitle Demonstration Form and permission limitation, facilitate user.

Embodiment three

Fig. 3 is the structural schematic diagram of one of the embodiment of the present invention three subtitling display equipment.The present embodiment provides one kind Subtitling display equipment is specifically included referring to Fig. 3:

Speech data extraction module 310, for obtaining current flow medium buffer data, extract it is described it is data cached in language Sound data；

Characteristic information extracting module 320 obtains parsing sentence, and mention for carrying out semantic parsing to the voice data Take the characteristic information in the parsing sentence；

Matching module 330 obtains target keyword for matching the characteristic information with pre-stored characteristics information.

Optionally, the speech data extraction module 310, comprising:

Detection unit, for detecting cache-time corresponding to current cache region；

Reading unit, for reading the data cached of the buffer area when the cache-time meets preset threshold.

Optionally, the matching module 330, comprising:

Pre-stored characteristics information acquisition unit prestores spy for obtaining within the scope of the default progress in pre-stored characteristics information bank Reference breath；

Mapping relations establish unit, and the mapping for establishing between the characteristic information and the pre-stored characteristics information is closed System；

Characteristic information matching unit is used for according to the mapping relations to the characteristic information and the pre-stored characteristics information It is matched.

Optionally, further includes:

Analysis module for carrying out part of speech analysis to the target keyword, and based on the analysis results closes the target Keyword is stored in pre-stored characteristics information bank.

Optionally, further includes:

Composite module obtains final semanteme for the target keyword to be combined；

Display module, for being shown the final semanteme according to the currently playing progress of the Streaming Media.

The technical solution of the present embodiment obtains current flow medium buffer data by speech data extraction module, extracts institute State the voice data in data cached；Characteristic information extracting module carries out semantic parsing to the voice data, obtains parsing language Sentence, and extract the characteristic information in the parsing sentence；Matching module carries out the characteristic information and pre-stored characteristics information Match, obtains target keyword.Carry out what semantic parsing obtained by obtaining the voice data in data cached, and to voice data It parses sentence and extracts characteristic information, characteristic information is matched to obtain target keyword with pre-stored characteristics information, improves word The accuracy of curtain identification, realizes the real-time of Subtitle Demonstration, gets rid of the form and permission limitation of original Subtitle Demonstration, convenient User.

Example IV

Fig. 4 is the structural schematic diagram of one of the embodiment of the present invention four terminal.Fig. 4, which is shown, to be suitable for being used to realizing this hair The block diagram of the exemplary terminal 412 of bright embodiment.The terminal 412 that Fig. 4 is shown is only an example, should not be to of the invention real The function and use scope for applying example bring any restrictions.

As shown in figure 4, terminal 412 is showed in the form of universal computing device.The component of terminal 412 may include but unlimited In one or more processor or processing unit 416, system storage 428, different system components (including system is connected Memory 428 and processing unit 416) bus 418.

Bus 418 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Terminal 412 typically comprises a variety of computer system readable media.These media can be it is any can be by terminal The usable medium of 412 access, including volatile and non-volatile media, moveable and immovable medium.

System storage 428 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 430 and/or cache memory 432.Terminal 412 may further include other removable/not removable Dynamic, volatile/non-volatile computer system storage medium.Only as an example, storage system 434 can be used for read and write can not Mobile, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, Ke Yiti For the disc driver for being read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to moving non-volatile light The CD drive of disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver It can be connected by one or more data media interfaces with bus 418.Memory 428 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention each The function of embodiment.

Program/utility 440 with one group of (at least one) program module 442, can store in such as memory In 428, such program module 442 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 442 Usually execute the function and/or method in embodiment described in the invention.

Terminal 412 can also be logical with one or more external equipments 414 (such as keyboard, sensing equipment, display 424 etc.) Letter, can also be enabled a user to one or more equipment interact with the terminal 412 communicate, and/or with make the terminal 412 Any equipment (such as network interface card, modem etc.) that can be communicated with one or more of the other calculating equipment communicates.This Kind communication can be carried out by input/output (I/O) interface 422.Also, terminal 412 can also by network adapter 420 with One or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as Shown in figure, network adapter 420 is communicated by bus 418 with other modules of terminal 412.It should be understood that although not showing in figure Out, other hardware and/or software module can be used in conjunction with terminal 412, including but not limited to: microcode, device driver, superfluous Remaining processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 416 is by running at least one of other programs in the multiple programs being stored in system storage 428 It is a, thereby executing various function application and data processing, such as realize a kind of Subtitle Demonstration provided by the embodiment of the present invention Method, comprising:

Server provided in this embodiment executes program by processor and realizes the current flow medium buffer data of acquisition, mentions Take it is described it is data cached in voice data；Semantic parsing is carried out to the voice data, obtains parsing sentence, and described in extraction Parse the characteristic information in sentence；The characteristic information is matched with pre-stored characteristics information, obtains target keyword.Pass through The voice data in data cached is obtained, and the parsing sentence that semantic parsing obtains is carried out to voice data and extracts characteristic information, Characteristic information is matched to obtain target keyword with pre-stored characteristics information, the accuracy of subtitle recognition is improved, realizes The real-time of Subtitle Demonstration gets rid of the form and permission limitation of original Subtitle Demonstration, facilitates user.

Embodiment five

The embodiment of the present invention five additionally provides a kind of storage medium comprising computer executable instructions, and computer is executable Instruction by computer processor when being executed for executing a kind of caption presentation method, comprising:

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the terminal of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which, which can be commanded, executes terminal, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution terminal, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, programming language include object oriented program language-such as Java, Smalltalk, C++, are also wrapped Include conventional procedural programming language-such as " C " language or similar programming language.Program code can be complete Ground executes on the user computer, partly executes on the user computer, executing as an independent software package, partially existing Part executes on the remote computer or executes on remote computer or terminal completely on subscriber computer.It is being related to far In the situation of journey computer, remote computer can pass through the network of any kind --- including local area network (LAN) or wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as using ISP come It is connected by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The present invention is not limited to specific embodiments here, be able to carry out for a person skilled in the art it is various it is apparent variation, again Adjustment and substitution are without departing from protection scope of the present invention.Therefore, although by above embodiments to the present invention carried out compared with For detailed description, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, can be with Including more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of caption presentation method, which is characterized in that the described method includes:

Semantic parsing is carried out to the voice data, obtains parsing sentence, and extracts the characteristic information in the parsing sentence；

2. extracting the caching the method according to claim 1, wherein obtaining current flow medium buffer data Voice data in data, comprising:

Detect cache-time corresponding to current cache region；

When the cache-time meets preset threshold, the data cached of the buffer area is read.

3. the method according to claim 1, wherein by the characteristic information and the progress of pre-stored characteristics information Match, obtain target keyword, comprising:

Obtain the pre-stored characteristics information within the scope of the default progress in pre-stored characteristics information bank；

Establish the mapping relations between the characteristic information and the pre-stored characteristics information；

The characteristic information is matched with the pre-stored characteristics information according to the mapping relations.

4. according to the method described in claim 2, it is characterized in that, by the characteristic information and the progress of pre-stored characteristics information Match, after obtaining target keyword, which comprises

Part of speech analysis is carried out to the target keyword, and based on the analysis results by target keyword deposit pre-stored characteristics letter It ceases in library.

5. the method according to claim 1, wherein the method also includes:

The target keyword is combined to obtain final semanteme；

The final semanteme is shown according to the currently playing progress of the Streaming Media.

6. a kind of subtitling display equipment characterized by comprising

Characteristic information extracting module obtains parsing sentence, and extract the solution for carrying out semantic parsing to the voice data Analyse the characteristic information in sentence；

7. device according to claim 6, which is characterized in that the speech data extraction module, comprising:

8. device according to claim 6, which is characterized in that the matching module, comprising:

Pre-stored characteristics information acquisition unit, for obtaining the letter of the pre-stored characteristics within the scope of the default progress in pre-stored characteristics information bank Breath；

Mapping relations establish unit, the mapping relations for establishing between the characteristic information and the pre-stored characteristics information；

Characteristic information matching unit, for being carried out according to the mapping relations to the characteristic information and the pre-stored characteristics information Matching.

9. a kind of terminal, which is characterized in that the terminal includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as a kind of caption presentation method as claimed in any one of claims 1 to 5.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor A kind of such as caption presentation method as claimed in any one of claims 1 to 5 is realized when execution.