CN107369450B

CN107369450B - Recording method and recording apparatus

Info

Publication number: CN107369450B
Application number: CN201710665002.6A
Authority: CN
Inventors: 郭昌雄; 吴剑海; 瞿向雷; 李君�; 杜歆文; 金圣韬; 仲亚军; 孟琳
Original assignee: SUZHOU BROADCASTING AND TELEVISION STATION
Current assignee: SUZHOU BROADCASTING AND TELEVISION STATION
Priority date: 2017-08-07
Filing date: 2017-08-07
Publication date: 2021-03-12
Anticipated expiration: 2037-08-07
Also published as: CN107369450A

Abstract

The invention discloses a receiving and recording method and a receiving and recording device, wherein the receiving and recording method comprises the following steps: setting a voice recognition task according to an electronic program guide; distributing the voice recognition task according to the state of the intelligent processing server; and the intelligent processing server performs voice recognition processing on the corresponding video material according to the voice recognition task and stores the text information after the voice recognition processing, so that a user can search the corresponding video material through keywords related to the text information. The technical scheme of the invention sets a voice recognition task according to an electronic program guide, and automatically recognizes and presents the voice information of the recorded material as character information through intelligent voice recognition; the material management platform can quickly position the materials according to the voice information, and provides powerful support for a later-stage production platform.

Description

Recording method and recording apparatus

Technical Field

The invention relates to the technical field of media asset management, in particular to a recording method and a recording device.

Background

With the digitalization, networking and informatization of the whole program production process and the development of the internet and the mobile internet, the traditional television is gradually fused with the internet, so that a television station has the requirement of 'media convergence' service. Meanwhile, in order to deal with the new medium development pattern, the content sharing, management and production services are provided for media platforms including televisions, broadcasts, websites, mobile phones and internet televisions.

With the explosive growth of video material amount, in order to better utilize video resources, a method for receiving and recording program content is provided for a converged media production platform and providing service for production in a station.

Disclosure of Invention

In view of the above problems, the present invention provides a new recording method and recording apparatus.

One embodiment of the present invention provides a listing method, including:

setting a voice recognition task according to an electronic program guide;

distributing the voice recognition task according to the state of the intelligent processing server;

and the intelligent processing server performs voice recognition processing on the corresponding video material according to the voice recognition task and stores the text information after the voice recognition processing.

In the above-described recording method, the electronic program guide is received from an external system interface.

In the above recording method, the intelligent processing server obtains the video material by accessing an index file.

In the recording method, the intelligent processing server performs a strip splitting process on the video material while performing a voice recognition process on the video material.

In the above recording method, the strip splitting process includes a transition recognition process, a face recognition process, and a subtitle recognition process.

Another embodiment of the present invention provides a recording apparatus including:

the voice recognition task setting module is used for setting a voice recognition task according to the electronic program guide;

the task distribution module distributes the voice recognition task according to the state of the intelligent processing server;

In the above recording apparatus, the electronic program guide acquisition module receives the electronic program guide from an external system interface.

In the recording apparatus, the intelligent processing server obtains the video material by accessing an index file.

In the recording apparatus, the intelligent processing server may perform a strip splitting process on the video material while performing the voice recognition process on the video material.

In the above recording apparatus, the strip splitting process includes a transition recognition process, a face recognition process, and a subtitle recognition process.

The technical scheme of the invention sets a voice recognition task according to an electronic program guide, provides intelligent voice recognition by constructing a recording system, and automatically recognizes and presents recorded material voice information as character information by intelligent voice recognition; the material management platform can quickly position the material according to the voice recognition information, and provides powerful support for a later-stage production platform.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention.

Fig. 1 shows a flowchart of the listing method of embodiment 1.

Fig. 2 is a schematic configuration diagram of the recording apparatus according to embodiment 2.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "includes" or "may include" used in various embodiments of the present invention indicate the presence of the disclosed functions, operations, or elements, and do not limit the addition of one or more functions, operations, or elements. Furthermore, as used in various embodiments of the present invention, the terms "comprises," "comprising," "includes," "including," "has," "having" and their derivatives are intended to mean that the specified features, numbers, steps, operations, elements, components, or combinations of the foregoing, are only meant to indicate that a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be construed as first excluding the existence of, or adding to the possibility of, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

In various embodiments of the invention, the expression "at least one of a or/and B" includes any or all combinations of the words listed simultaneously. For example, the expression "at least one of a or/and B" may include a, may include B, or may include both a and B.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The method is used for the cooperation of an intelligent television system and a transcoding center, the recorded files stored in a shared area are obtained, and the recorded files are processed by an intelligent voice recognition server and can be used by other systems in a broadcast station.

Example 1

Fig. 1 shows a flowchart of the listing method of embodiment 1.

In step S110, a voice recognition task is set according to an electronic program guide.

An Electronic Program Guide (EPG) contains a playlist of all channels that are included each day. The voice recognition task is set for a video program requiring a voice recognition process, and for example, the voice recognition can be performed for a news simulcast of 19:00-19:30CCTV1 every day for a period of time.

The electronic program guide may be obtained from an external system interface, such as a smart television system.

In step S120, the speech recognition task is distributed according to the state of the intelligent processing server.

Since there are various large amounts of information and tasks to be processed in the broadcast station, there may be a plurality of intelligent processing servers to process the information and tasks. Thus, speech recognition tasks may be assigned according to the current state of the intelligent processing server, e.g., more tasks may be assigned to a relatively idle intelligent processing server.

In step S130, after the intelligent processing server performs the speech recognition processing on the corresponding video material according to the speech recognition task, the text information after the speech recognition processing is stored.

The intelligent processing server may obtain the video material by accessing the index file. For example, the smart processing server accesses m3u8 from the shared folder to obtain video material. The slicing server can read the EPG information to generate an m3u8 file. The H264 and AAC encoded MP4 files can also be sliced by ffmpeg to generate an m3u8 playlist (index file) and a plurality of ts files, and place them (m3u8, ts) under a specified directory. m3u8 is a file index, and a program recorded with a lot of data files can form an m3u8 file index.

And the intelligent processing server performs voice recognition processing on the corresponding video material to obtain corresponding text information.

Speech recognition comprises two stages: training and identifying. For better speech recognition, the intelligent processing server may be a pre-trained intelligent processing server. Both for training and recognition, input speech must be pre-processed and feature extracted. The specific work done in the training stage is to collect a large amount of speech corpora, obtain feature vector parameters after preprocessing and feature extraction, and finally achieve the purpose of establishing a reference model library of training speech through feature modeling. And the main work of the recognition stage is to compare the similarity measurement of the feature vector parameters of the input voice with the reference models in the reference model library, and then output the input feature vector with the highest similarity as a recognition result. Thus, the aim of voice recognition is finally achieved.

When the video material is subjected to voice recognition processing, the intelligent processing server can perform strip splitting processing on the video material.

The strip splitting processing is mainly based on transition recognition technology and face recognition technology. The transition recognition technology recognizes the picture of shot conversion in the video and provides frame accurate processing data for subsequent intelligent recognition processing. The transition recognition technology is based on a histogram shear lens automatic detection algorithm, and the recognition accuracy is guaranteed.

Transition identification is carried out in the link of receiving and recording material migration, and a bottom layer identification library is used for automatically extracting material transition frames to assist in quickly positioning segment cut points, so that the time of the seek material in the process of splitting the news program is saved, and particularly, the efficiency of splitting the news program is greatly improved. When the strip splitting client side executes strip splitting material examination, the transition frame corresponding to the material can be automatically loaded according to the material information recorded in the database, and a user can directly operate the transition frame. And the editing is carried out while the recorded material is read, so that the refreshing and loading of the transition frame are realized while the material is refreshed.

The face recognition technology is mainly applied to accurate positioning of a news host picture and provides basic data for distinguishing the host picture from other pictures in subsequent intelligent processing.

For example, a news video may be decoded, grouped, for example, every 5 frames, face recognition may be performed using, for example, opencv, correlation analysis may be performed using histograms, and then key frames may be selected. Then, similarity analysis can be carried out by utilizing multi-scale LBP characteristics, a histogram, Hog characteristics, hash fingerprints and the like, and finally, the moderator is determined through classification statistics.

In addition, the strip-disassembling processing can also adopt a caption identification technology, the edited caption often exists in a program picture and can be directly used for disassembling the caption of the material segment after the strip is disassembled, the strip-disassembling system only needs to select the caption picture to be identified, and the system can automatically identify the picture into caption characters, so that the method is simple and rapid.

In the above embodiment 1, since the Electronic Program Guide (EPG) is used for positioning, it is possible to accurately position and identify which program the material belongs to, and accurately position the start position of the program, without the need for the picture recognition technique generally employed in the prior art, thereby greatly reducing the amount of data processing, and the positioning is more accurate.

According to the embodiment 1, the voice recognition task is set according to the electronic program guide, intelligent voice recognition is provided by constructing the recording system, and the recorded material voice information is automatically recognized and presented as character information through the intelligent voice recognition; after the video material is recognized as the text information by voice, full-text retrieval is supported, and a user can locate the specific video material after inputting the corresponding keywords in a retrieval frame, namely, the material management platform can quickly locate the material according to the voice information, so that powerful support is provided for a post-production platform.

Example 2

Fig. 2 is a schematic diagram showing a configuration of the recording apparatus according to embodiment 2.

The listing apparatus 200 includes a speech recognition task setting module 210 that sets a speech recognition task according to an electronic program guide.

And the task distribution module 220 distributes the voice recognition task according to the state of the intelligent processing server.

And the storage module 230 is configured to store the file after the voice recognition processing is performed on the corresponding video material by the intelligent processing server according to the voice recognition task.

In addition, while the video material is subjected to voice recognition processing, the intelligent processing server can perform striping processing on the video material. As described above, the strip splitting process is mainly based on transition recognition technology and face recognition technology, and subtitle recognition technology may also be employed.

In the above embodiment 2, since the Electronic Program Guide (EPG) is used for positioning, it is possible to accurately position and identify which program the material belongs to, and accurately position the start position of the program, without the need for the picture recognition technique generally employed in the prior art, thereby greatly reducing the amount of data processing, and the positioning is more accurate.

According to the embodiment 2, the voice recognition task is set according to the electronic program guide, intelligent voice recognition is provided by constructing the recording system, and the recorded material voice information is automatically recognized and presented as character information through intelligent voice recognition; after the video material is recognized as the character information by voice, full-text retrieval is supported, a user can locate the specific video material after inputting the corresponding keywords in a retrieval frame, and the material management platform can quickly locate the material according to the voice information, so that powerful support is provided for a post-production platform.

The listing system may include two parts: one part is the background service part and the other part is the application part.

The storage and exchange used by the system are standard equipment, and the system adopts an Ethernet single network architecture, so that the construction and maintenance are convenient. In order to ensure the safety and stability of the system, all the servers can be deployed in a dual-computer hot standby mode.

The data in the recording system can be downloaded according to the requirement, and the dotting part of content can be downloaded and transcoded and then transferred to the target system by calling the service of an off-line transcoding center in the broadcast station.

A background service module: the listing system will provide data services, listing services, key frame services, intelligent distribution services, interface services, intelligent content speech recognition services.

An application service module: the listing system will provide B/S application services including search, preview, point, download, etc. applications.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof, which essentially contributes to the prior art, can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be, for example, a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. An inclusion method, comprising:

setting a voice recognition task aiming at a video program needing voice recognition processing according to an electronic program guide;

the intelligent processing server performs voice recognition processing on the corresponding video materials according to the voice recognition task and stores the text information after the voice recognition processing, so that a user can search the corresponding video materials through keywords related to the text information;

and when the video material is subjected to voice recognition processing, the intelligent processing server is used for splitting the video material.

2. The recording method according to claim 1,

the electronic program guide is received from an external system interface.

3. The recording method according to claim 1,

and the intelligent processing server acquires the video material by accessing the index file.

4. The recording method according to claim 1,

the strip splitting processing comprises transition recognition processing, face recognition processing and subtitle recognition processing.

5. A listing apparatus, comprising:

the voice recognition task setting module is used for setting a voice recognition task aiming at a video program needing voice recognition processing according to an electronic program guide;

the storage module is used for storing the text information after the voice recognition processing is carried out on the corresponding video material by the intelligent processing server according to the voice recognition task, so that a user can search the corresponding video material through keywords related to the text information;

the intelligent server conducts voice recognition processing on the video materials and meanwhile conducts strip splitting processing on the video materials.

6. The listing device of claim 5,

and the electronic program guide acquisition module receives the electronic program guide from an external system interface.

7. The listing device of claim 5,

8. The listing device of claim 5,