CN109788232A

CN109788232A - A kind of summary of meeting recording method of video conference, device and system

Info

Publication number: CN109788232A
Application number: CN201811550951.0A
Authority: CN
Inventors: 蔡耀; 韩杰; 安君超; 王康桑
Original assignee: Visionvera Information Technology Co Ltd
Current assignee: Visionvera Information Technology Co Ltd
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2019-05-21

Abstract

The embodiment of the invention provides a kind of meeting summary recording methods of video conference, device and system, this method comprises: participating in video conference with the identity of software terminal, and receive the video flowing for the video conference that view networked server is sent；The one-frame video data in the video flowing is extracted every setting time, recognition of face is carried out to the frame video data, determines the identity information of spokesman, and determine the time limit of speech of the spokesman；According to the timestamp of the audio data in the video flowing, the audio data is converted into text, using as speech content；According to the timestamp of the time limit of speech of the spokesman and the audio data, by the identity information of spokesman write-in document corresponding with speech content, to form meeting summary document.The embodiment of the present invention, which realizes, automatically records meeting summary, avoids the errors and omissions of manual record, and the meeting summary of formation is more accurate and comprehensive.

Description

A kind of summary of meeting recording method of video conference, device and system

Technical field

The present invention relates to view networking technology fields, meeting summary recording method, dress more particularly to a kind of video conference It sets and system.

Background technique

With the fast development of the network technology, the two-way communications such as video conference, video teaching the life of user, work, Study etc. is widely available.

In the prior art, when carrying out video conference, personnel's hand is recorded if necessary to record meeting summary, or based on participant It is dynamic to be recorded, the manually recorded place for inevitably having omission or misregistration, and also efficiency is lower.

Summary of the invention

In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kind A kind of meeting summary recording method of the video conference to solve the above problems, device and system.

The embodiment of the invention discloses a kind of meeting summary recording method of video conference, the method is applied to view networking In, comprising:

Video conference is participated in the identity of software terminal, and receives the video flowing for the video conference that view networked server is sent；

The one-frame video data in the video flowing is extracted every setting time, face knowledge is carried out to the frame video data Not, it determines the identity information of spokesman, and determines the time limit of speech of the spokesman；

According to the timestamp of the audio data in the video flowing, the audio data is converted into text, using as hair Say content；

According to the timestamp of the time limit of speech of the spokesman and the audio data, by the identity information of the spokesman Write-in document corresponding with speech content, to form meeting summary document.

Optionally, the one-frame video data extracted every setting time in the video flowing, comprising:

The video data of a key frame in the video flowing is extracted every setting time.

Optionally, the described pair of frame video data carries out recognition of face, determines the identity information of spokesman, comprising:

Extract the face characteristic in the frame video data；

According to the corresponding relationship of the face characteristic and pre-stored face characteristic and personnel identity information, speech is determined The identity information of people.

Optionally, after forming meeting summary document, further includes:

The meeting summary document is sent to intelligent meeting system.

Optionally, the setting time is one second.

The embodiment of the invention also discloses a kind of meeting summary recording device of video conference, described device is applied to view connection In net, comprising:

Video flowing receiving module for participating in video conference with the identity of software terminal, and receives view networked server and sends Video conference video flowing；

Face recognition module regards the frame for extracting the one-frame video data in the video flowing every setting time Frequency determines the identity information of spokesman according to recognition of face is carried out, and determines the time limit of speech of the spokesman；

Speech recognition module turns the audio data for the timestamp according to the audio data in the video flowing Be changed to text, using as speech content；

Meeting summary logging modle, for according to the time limit of speech of the spokesman and the timestamp of the audio data, By the identity information of spokesman write-in document corresponding with speech content, to form meeting summary document.

Optionally, the face recognition module includes:

Video data extraction unit, for extracting the video counts of a key frame in the video flowing every setting time According to.

Optionally, the face recognition module includes:

Face characteristic extraction unit, for extracting the face characteristic in the frame video data；

Face identification unit, for according to the face characteristic and pre-stored face characteristic and personnel identity information Corresponding relationship determines the identity information of spokesman.

Optionally, further includes:

Meeting summary sending module, for the meeting summary document to be sent to intelligent meeting system.

Optionally, the setting time is one second.

The embodiment of the invention also discloses a kind of meeting summaries of video conference to record system, including view networked terminals, view Networked server and conference dispatching system further include AI server, and the AI server is for executing as described in relation to the first aspect The meeting summary recording method of video conference.

The embodiment of the present invention includes following advantages:

The characteristic of application view networking of the embodiment of the present invention receives view connection by participating in video conference with the identity of software terminal The video flowing for the video conference that network server is sent, extracts the one-frame video data in the video flowing every setting time, right The frame video data carries out recognition of face, the identity information of spokesman is determined, and determine the time limit of speech of spokesman, according to video Audio data is converted to text by the timestamp of the audio data in stream, using as speech content, and according to the speech of spokesman The timestamp of time and audio data, by the identity information of spokesman write-in document corresponding with speech content, to form meeting Summary document, realizes and automatically records to meeting summary, avoids the errors and omissions of manual record, the meeting summary of formation It is more accurate and comprehensive, and improve record efficiency.

Detailed description of the invention

Fig. 1 is a kind of networking schematic diagram of view networking of the invention；

Fig. 2 is a kind of hardware structural diagram of node server of the invention；

Fig. 3 is a kind of hardware structural diagram of access switch of the invention；

Fig. 4 is the hardware structural diagram that a kind of Ethernet association of the invention turns gateway；

Fig. 5 is a kind of step flow chart of the summary of meeting recording method of video conference of the embodiment of the present invention；

Fig. 6 is a kind of structural block diagram of the meeting summary recording device of video conference of the embodiment of the present invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

It is the important milestone of network Development depending on networking, is a real-time network, can be realized HD video real-time Transmission, Push numerous Internet applications to HD video, high definition is face-to-face.

Real-time high-definition video switching technology is used depending on networking, it can be such as high in a network platform by required service Clear video conference, Intellectualized monitoring analysis, emergency command, digital broadcast television, delay TV, the Web-based instruction, shows video monitoring Field live streaming, VOD program request, TV Mail, individual character records (PVR), Intranet (manages) channel by oneself, intelligent video Broadcast Control, information publication All be incorporated into a system platform etc. services such as tens of kinds of videos, voice, picture, text, communication, data, by TV or Computer realizes that high-definition quality video plays.

Embodiment in order to enable those skilled in the art to better understand the present invention is introduced to depending on networking below:

Depending on networking, applied portion of techniques is as described below:

Network technology (Network Technology)

Traditional ethernet (Ethernet) is improved depending on the network technology innovation networked, with potential huge on network Video flow.(Circuit is exchanged different from simple network packet packet switch (Packet Switching) or lattice network Switching), Streaming demand is met using Packet Switching depending on networking technology.Has grouping depending on networking technology Flexible, the simple and low price of exchange, is provided simultaneously with the quality and safety assurance of circuit switching, it is virtually electric to realize the whole network switch type The seamless connection of road and data format.

Switching technology (Switching Technology)

Two advantages of asynchronous and packet switch that Ethernet is used depending on networking eliminate Ethernet under the premise of complete compatible and lack It falls into, has the end-to-end seamless connection of the whole network, direct user terminal, directly carrying IP data packet.User data is in network-wide basis It is not required to any format conversion.It is the more advanced form of Ethernet depending on networking, is a real-time exchange platform, can be realized at present mutually The whole network large-scale high-definition realtime video transmission that networking cannot achieve pushes numerous network video applications to high Qinghua, unitizes.

Server technology (Server Technology)

It is different from traditional server, its Streaming Media depending on the server technology in networking and unified video platform Transmission be built upon it is connection-oriented on the basis of, data-handling capacity is unrelated with flow, communication time, single network layer energy Enough transmitted comprising signaling and data.For voice and video business, handled depending on networking and unified video platform Streaming Media Complexity many simpler than data processing, efficiency substantially increase hundred times or more than traditional server.

Reservoir technology (Storage Technology)

The ultrahigh speed reservoir technology of unified video platform in order to adapt to the media content of vast capacity and super-flow and Using state-of-the-art real time operating system, the programme information in server instruction is mapped to specific hard drive space, media Content is no longer pass through server, and moment is directly delivered to user terminal, and user waits typical time less than 0.2 second.It optimizes Sector distribution greatly reduces the mechanical movement of hard disc magnetic head tracking, and resource consumption only accounts for the 20% of the internet ad eundem IP, but The concurrent flow greater than 3 times of traditional disk array is generated, overall efficiency promotes 10 times or more.

Network security technology (Network Security Technology)

Depending on the structural design networked by servicing independent licence system, equipment and the modes such as user data is completely isolated every time The network security problem that puzzlement internet has thoroughly been eradicated from structure, does not need antivirus applet, firewall generally, has prevented black The attack of visitor and virus, structural carefree secure network is provided for user.

It services innovative technology (Service Innovation Technology)

Business and transmission are fused together by unified video platform, whether single user, private user or a net The sum total of network is all only primary automatic connection.User terminal, set-top box or PC are attached directly to unified video platform, obtain rich The multimedia video service of rich colorful various forms.Unified video platform is traditional to substitute with table schema using " menu type " Complicated applications programming, considerably less code, which can be used, can be realized complicated application, realize the new business innovation of " endless ".

Networking depending on networking is as described below:

It is a kind of central controlled network structure depending on networking, which can be Tree Network, Star network, ring network etc. class Type, but centralized control node is needed to control whole network in network on this basis.

As shown in Figure 1, being divided into access net and Metropolitan Area Network (MAN) two parts depending on networking.

The equipment of access mesh portions can be mainly divided into 3 classes: node server, access switch, terminal (including various machines Top box, encoding board, memory etc.).Node server is connected with access switch, and access switch can be with multiple terminal phases Even, and it can connect Ethernet.

Wherein, node server is the node that centralized control functions are played in access net, can control access switch and terminal. Node server can directly be connected with access switch, can also directly be connected with terminal.

Similar, the equipment of metropolitan area mesh portions can also be divided into 3 classes: metropolitan area server, node switch, node serve Device.Metropolitan area server is connected with node switch, and node switch can be connected with multiple node servers.

Wherein, node server is the node server for accessing mesh portions, i.e. node server had both belonged to access wet end Point, and belong to metropolitan area mesh portions.

Metropolitan area server is the node that centralized control functions are played in Metropolitan Area Network (MAN), can control node switch and node serve Device.Metropolitan area server can be directly connected to node switch, can also be directly connected to node server.

It can be seen that be entirely a kind of central controlled network structure of layering depending on networking network, and node server and metropolitan area The network controlled under server can be the various structures such as tree-shaped, star-like, cyclic annular.

Visually claim, access mesh portions can form unified video platform (part in virtual coil), and multiple unified videos are flat Platform can form view networking；Each unified video platform can be interconnected by metropolitan area and wide area depending on networking.

Classify depending on networked devices

1.1 embodiment of the present invention can be mainly divided into 3 classes: server depending on the equipment in networking, interchanger (including ether Net gateway), terminal (including various set-top boxes, encoding board, memory etc.).Depending on networking can be divided on the whole Metropolitan Area Network (MAN) (or National net, World Wide Web etc.) and access net.

1.2 equipment for wherein accessing mesh portions can be mainly divided into 3 classes: node server, access switch (including ether Net gateway), terminal (including various set-top boxes, encoding board, memory etc.).

The specific hardware structure of each access network equipment are as follows:

Node server:

As shown in Fig. 2, mainly including Network Interface Module 201, switching engine module 202, CPU module 203, disk array Module 204；

Wherein, Network Interface Module 201, the Bao Jun that CPU module 203, disk array module 204 are come in enter switching engine Module 202；Switching engine module 202 look into the operation of address table 205 to the packet come in, to obtain the navigation information of packet； And the packet is stored according to the navigation information of packet the queue of corresponding pack buffer 206；If the queue of pack buffer 206 is close It is full, then it abandons；All pack buffer queues of 202 poll of switching engine mould, are forwarded: 1) port if meeting the following conditions It is less than to send caching；2) the queue package counting facility is greater than zero.Disk array module 204 mainly realizes the control to hard disk, including The operation such as initialization, read-write to hard disk；CPU module 203 is mainly responsible between access switch, terminal (not shown) Protocol processes, to address table 205 (including descending protocol packet address table, uplink protocol package address table, data packet addressed table) Configuration, and, the configuration to disk array module 204.

Access switch:

As shown in figure 3, mainly including Network Interface Module (downstream network interface module 301, uplink network interface module 302), switching engine module 303 and CPU module 304；

Wherein, the packet (upstream data) that downstream network interface module 301 is come in enters packet detection module 305；Packet detection mould Whether mesh way address (DA), source address (SA), type of data packet and the packet length of the detection packet of block 305 meet the requirements, if met, It then distributes corresponding flow identifier (stream-id), and enters switching engine module 303, otherwise abandon；Uplink network interface mould The packet (downlink data) that block 302 is come in enters switching engine module 303；The data packet that CPU module 204 is come in enters switching engine Module 303；Switching engine module 303 look into the operation of address table 306 to the packet come in, to obtain the navigation information of packet； If the packet into switching engine module 303 is that downstream network interface is gone toward uplink network interface, in conjunction with flow identifier (stream-id) packet is stored in the queue of corresponding pack buffer 307；If the queue of the pack buffer 307 is close full, It abandons；If the packet into switching engine module 303 is not that downstream network interface is gone toward uplink network interface, according to packet Navigation information is stored in the data packet queue of corresponding pack buffer 307；If the queue of the pack buffer 307 is close full, Then abandon.

All pack buffer queues of 303 poll of switching engine module, are divided to two kinds of situations in embodiments of the present invention:

If the queue is that downstream network interface is gone toward uplink network interface, meets the following conditions and be forwarded: 1) It is less than that the port sends caching；2) the queue package counting facility is greater than zero；3) token that rate control module generates is obtained；

If the queue is not that downstream network interface is gone toward uplink network interface, meets the following conditions and is forwarded: 1) it is less than to send caching for the port；2) the queue package counting facility is greater than zero.

Rate control module 208 is configured by CPU module 204, to all downlink networks in programmable interval Interface generates token toward the pack buffer queue that uplink network interface is gone, to control the code rate of forwarded upstream.

CPU module 304 is mainly responsible for the protocol processes between node server, the configuration to address table 306, and, Configuration to rate control module 308.

Ethernet association turns gateway:

As shown in figure 4, mainly including Network Interface Module (downstream network interface module 401, uplink network interface module 402), switching engine module 403, CPU module 404, packet detection module 405, rate control module 408, address table 406, Bao Huan Storage 407 and MAC adding module 409, MAC removing module 410.

Wherein, the data packet that downstream network interface module 401 is come in enters packet detection module 405；Packet detection module 405 is examined Ethernet mac DA, ethernet mac SA, Ethernet length or frame type, the view networking mesh way address of measured data packet DA, whether meet the requirements depending on networking source address SA, depending on networking data Packet type and packet length, corresponding stream is distributed if meeting Identifier (stream-id)；Then, MAC DA, MAC SA, length or frame type are subtracted by MAC removing module 410 (2byte), and enter corresponding receive and cache, otherwise abandon；

Downstream network interface module 401 detects the transmission caching of the port, according to the view of packet networking mesh if there is Bao Ze Address D A knows the ethernet mac DA of corresponding terminal, adds the ethernet mac DA of terminal, Ethernet assists the MAC for turning gateway SA, Ethernet length or frame type, and send.

The function that Ethernet association turns other modules in gateway is similar with access switch.

Terminal:

It mainly include Network Interface Module, Service Processing Module and CPU module；For example, set-top box mainly connects including network Mouth mold block, video/audio encoding and decoding engine modules, CPU module；Encoding board mainly includes Network Interface Module, video encoding engine Module, CPU module；Memory mainly includes Network Interface Module, CPU module and disk array module.

The equipment of 1.3 metropolitan area mesh portions can be mainly divided into 2 classes: node server, node switch, metropolitan area server. Wherein, node switch mainly includes Network Interface Module, switching engine module and CPU module；Metropolitan area server mainly includes Network Interface Module, switching engine module and CPU module are constituted.

2, networking data package definition is regarded

2.1 access network data package definitions

Access net data packet mainly include following sections: destination address (DA), source address (SA), reserve bytes, payload(PDU)、CRC。

As shown in the table, the data packet for accessing net mainly includes following sections:

DA

SA

Reserved

Payload

CRC

Wherein:

Destination address (DA) is made of 8 bytes (byte), and first character section indicates type (such as the various associations of data packet Discuss packet, multicast packet, unicast packet etc.), be up to 256 kinds of possibility, the second byte to the 6th byte is metropolitan area net address, Seven, the 8th bytes are access net address；

Source address (SA) is also to be made of 8 bytes (byte), is defined identical as destination address (DA)；

Reserve bytes are made of 2 bytes；

The part payload has different length according to the type of different datagrams, is if it is various protocol packages 64 bytes are 32+1024=1056 bytes if it is single group unicast packets words, are not restricted to above 2 kinds certainly；

CRC is made of 4 bytes, and calculation method follows the Ethernet CRC algorithm of standard.

2.2 Metropolitan Area Network (MAN) packet definitions

The topology of Metropolitan Area Network (MAN) is pattern, may there is 2 kinds, connection even of more than two kinds, i.e. node switching between two equipment It can all can exceed that 2 kinds between machine and node server, node switch and node switch, node switch and node server Connection.But the metropolitan area net address of metropolitan area network equipment is uniquely, to close to accurately describe the connection between metropolitan area network equipment System, introduces parameter in embodiments of the present invention: label, uniquely to describe a metropolitan area network equipment.

(Multi-Protocol Label Switch, multiprotocol label are handed over by the definition of label and MPLS in this specification Change) label definition it is similar, it is assumed that between equipment A and equipment B there are two connection, then data packet from equipment A to equipment B just There are 2 labels, data packet also there are 2 labels from equipment B to equipment A.Label is divided into label, outgoing label, it is assumed that data packet enters The label (entering label) of equipment A is 0x0000, and the label (outgoing label) when this data packet leaves equipment A may reform into 0x0001.The networking process of Metropolitan Area Network (MAN) is to enter network process under centralized control, also means that address distribution, the label of Metropolitan Area Network (MAN) Distribution be all to be dominated by metropolitan area server, node switch, node server be all passively execute, this point with The label distribution of MPLS is different, and the distribution of the label of MPLS is the result that interchanger, server are negotiated mutually.

As shown in the table, the data packet of Metropolitan Area Network (MAN) mainly includes following sections:

DA

SA

Reserved

Label

Payload

CRC

That is destination address (DA), source address (SA), reserve bytes (Reserved), label, payload (PDU), CRC.Its In, the format of label, which can refer to, such as gives a definition: label is 32bit, wherein high 16bit retains, only with low 16bit, its position Set is between the reserve bytes and payload of data packet.

Based on the above-mentioned characteristic of view networking, one of the core concepts of the embodiments of the present invention is proposed, it then follows regard the association of networking View identifies spokesman's identity according to video data during video conference, and audio data is converted to text, thus really Determine the speech content of spokesman.

Referring to Fig. 5, a kind of step process of the summary of meeting recording method of video conference of the embodiment of the present invention is shown Figure, this method can be applied in view networking, can be by AI (Artificial Intelligence, artificial intelligence) server It executes, can specifically include following steps:

Step 501, video conference is participated in the identity of software terminal, and receives the video conference that view networked server is sent Video flowing.

AI server is the server based on Linux, during video conference, with the body of the software terminal of video conference Part participates in video conference.AI server possesses view networking number, during video conference, Pamir meeting tune as software terminal Degree system carries out a group meeting, will participate in video conference according to the networking number of the view of each view networked terminals of video conference to be participated in Pulled in video conference depending on networked terminals, while AI server is also drawn into video conference as software terminal, to participate in regarding Frequency meeting.During participating in video conference, depending on networked server based on received depending on networking protocol in video conference one After the video flowing that a view networked terminals are sent, based on view networking protocol by video stream to other terminals in video conference, AI server also receives the video flowing for the video conference that view networked server is sent based on view networking protocol as software terminal.

It wherein, is the application apparatus of carrying view networking service depending on networked terminals, video conference, monitoring can be supported to check, The full screen business such as videophone, live streaming program request, visual command, tele-medicine or remote training.In the concrete realization, depending on connection Network termination can be set-top box (SetTopBox, STB), commonly referred to as box on set-top box or machine, be a connection television set with outside The equipment of portion's signal source, the digital signal of compression can be changed into television content by it, and be shown on a television set.It is general and Speech, set-top box can connect camera and microphone, can also be with for acquiring the multi-medium datas such as video data and audio data Television set is connected, for multi-medium datas such as playing video data and audio datas.

Step 502, the one-frame video data in the video flowing is extracted every setting time, which is carried out Recognition of face determines the identity information of spokesman, and determines the time limit of speech of the spokesman.

The video flowing is decapsulated, video data and audio data are obtained, extracts frame view every setting time Frequency evidence, by carrying out recognition of face to the video data of extraction, with the identity information of determination spokesman therein, according to the frame The timestamp of video data determines the current time limit of speech of the spokesman, the body for the spokesman that recognition of face several times obtains in front and back When part information is identical, corresponding time limit of speech is counted, the time limit of speech as the spokesman.For example, recognizing one for the first time When spokesman, corresponding current time limit of speech be 10 points 6 seconds 5 minutes, continuous 60 times the spokesman recognized is the same speech People, it is assumed that setting time be one second, it is determined that the time limit of speech of the spokesman be from 10 points 66: 6 seconds 5 minutes to 10:.Face Identification can carry out recognition of face using seetaface face recognition technology, and seetaface is by Computer Department of the Chinese Academy of Science's mountain generation light The face recognition study group research and development that researcher leads, code is realized based on C++, does not depend on third party library.

Wherein, the setting time can be one second, so as to corresponding with the text after audio data conversion well, Avoid setting time too long the case where causing determining personnel identity and speech content not to be inconsistent.

Wherein, picture is divided into tri- kinds of I, P, B by mpeg encoded, and I is key frame, and P is forward predicted frame, and B is two-way interpolation Frame.I frame is a complete picture, and P frame and B frame recording is variation relative to I frame.

Since key frame is without reference to other frames, so by extracting a key frame in video flowing, it can be directly to this Key frame is decoded, so as to directly carry out recognition of face.

Wherein, the described pair of frame video data carries out recognition of face, determines the identity information of spokesman, comprising:

Extract the face characteristic in the frame video data；

User can be by the preparatory typing bareheaded photo of intelligent meeting system and corresponding personal information, and AI server is according to exempting from Hat is saved in recognition of face information data according to obtaining face characteristic and binding personal information, by face characteristic and personal information correspondence In library.When carrying out recognition of face, Face datection and positioning feature point are carried out to the frame video data first, navigating to feature After point, face characteristic is extracted, the face characteristic extracted is compared with pre-stored face characteristic, so that it is determined that speech The identity information of people.

Step 503, according to the timestamp of the audio data in the video flowing, the audio data is converted into text, Using as speech content.

Speech recognition is carried out to the audio data obtained after video flowing decapsulation, in real time be converted to audio data Text, to obtain speech content, and the content that will make a speech is corresponding with timestamp.When carrying out speech recognition, voice reality can be passed through When write technology (such as the real-time transcription technology of Iflytek voice), obtain corresponding text.For example, carrying out speech recognition When, voice data stream can be transmitted to Iflytek by websocket, to return to recognition result in real time.

Step 504, according to the timestamp of the time limit of speech of the spokesman and the audio data, by the spokesman's Identity information write-in document corresponding with speech content, to form meeting summary document.

According to the timestamp of the time limit of speech of spokesman and audio data, by the identity information of spokesman and speech content pair It answers, i.e., it is when the timestamp of audio data is fallen into the time limit of speech of the spokesman, the timestamp of the audio data is corresponding Conversion after text be speech content of the content as the spokesman of making a speech, and by the identity information of spokesman and speech content Corresponding write-in document, to form meeting summary document.

AI server, can also be by the identity information of spokesman after the identity information of spokesman is corresponding with speech content Subtitle file is generated with speech content, and subtitle file is sent to each view networked terminals for participating in video conference, depending on networking Terminal receives subtitle file can be using the identity information of spokesman and speech content as Subtitle Demonstration in corresponding television set On.

The technical solution of the present embodiment receives view networked server hair by participating in video conference with the identity of software terminal The video flowing for the video conference sent extracts the one-frame video data in the video flowing every setting time, to the frame video counts According to recognition of face is carried out, the identity information of spokesman is determined, and determine the time limit of speech of spokesman, according to the audio in video flowing Audio data is converted to text by the timestamp of data, using as speech content, and according to the time limit of speech and audio of spokesman The timestamp of data, by the identity information of spokesman write-in document corresponding with speech content, so that meeting summary document is formed, it is real Showed and meeting summary automatically recorded, avoided the errors and omissions of manual record, the meeting summary of formation it is more accurate and Comprehensively, and record efficiency is improved, has liberated the both hands of participant record personnel.

Based on the above technical solution, after forming meeting summary document, further includes:

The meeting summary document is sent to intelligent meeting system.

User can download meeting summary document by intelligent meeting system, can also be directed to the identity information of spokesman Incorrect place corresponding with speech content is modified, and uploads to AI server by intelligent meeting system, so as to The meeting summary of misregistration is corrected.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.

Referring to Fig. 6, a kind of structural block diagram of the meeting summary recording device of video conference of the embodiment of the present invention is shown, The device can be applied in view networking, can specifically include following module:

Video flowing receiving module 601 for participating in video conference with the identity of software terminal, and receives view networked server hair The video flowing for the video conference sent；

Face recognition module 602, for extracting the one-frame video data in the video flowing every setting time, to the frame Video data carries out recognition of face, determines the identity information of spokesman, and determine the time limit of speech of the spokesman；

Speech recognition module 603, for the timestamp according to the audio data in the video flowing, by the audio data Be converted to text, using as speech content；

Meeting summary logging modle 604, for according to the time limit of speech of the spokesman and the time of the audio data Stamp, by the identity information of spokesman write-in document corresponding with speech content, to form meeting summary document.

Optionally, the face recognition module includes:

Optionally, further includes:

Optionally, the setting time is one second.

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

The embodiment of the invention also discloses a kind of AI server, including memory, processor and storage are on a memory simultaneously The step of computer program that can be run on a processor, the processor realizes method as described above when executing described program.

The embodiment of the invention also discloses a kind of computer readable storage mediums, are stored thereon with computer program, the journey The step of method as described above is realized when sequence is executed by processor.

The embodiment of the invention also discloses a kind of meeting summaries of video conference to record system, including view networked terminals, view Networked server and conference dispatching system further include AI server, and the AI server is for executing video council as described above The meeting summary recording method of view.The conference dispatching system for example can be Pamir conference dispatching system.AI server can be with Be it is a kind of operation have seetaface recognition of face library, audio/video encoding/decoding program and recognition of face information database based on The server of Linux.The meeting summary record system of the video conference can also include intelligent meeting system, the intelligent meeting System is used for collector's information and corresponding human face photo, and the meeting summary of record is checked for user, and user passes through intelligence Conference system can also modify to meeting summary.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.

The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.

Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.

Above to a kind of meeting summary recording method of video conference provided by the present invention, device and system, carry out It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments Illustrate to be merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, according to According to thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification It should not be construed as limiting the invention.

Claims

1. a kind of meeting summary recording method of video conference, which is characterized in that the method is applied in view networking, comprising:

The one-frame video data in the video flowing is extracted every setting time, recognition of face is carried out to the frame video data, really Determine the identity information of spokesman, and determines the time limit of speech of the spokesman；

According to the timestamp of the audio data in the video flowing, the audio data is converted into text, as in speech Hold；

According to the timestamp of the time limit of speech of the spokesman and the audio data, by the identity information and hair of the spokesman The corresponding write-in document of content is sayed, to form meeting summary document.

2. the method according to claim 1, wherein one extracted every setting time in the video flowing Frame video data, comprising:

3. being determined the method according to claim 1, wherein the described pair of frame video data carries out recognition of face The identity information of spokesman, comprising:

Extract the face characteristic in the frame video data；

According to the corresponding relationship of the face characteristic and pre-stored face characteristic and personnel identity information, determine spokesman's Identity information.

4. the method according to claim 1, wherein after forming meeting summary document, further includes:

The meeting summary document is sent to intelligent meeting system.

5. the method according to claim 1, wherein the setting time is one second.

6. a kind of meeting summary recording device of video conference, which is characterized in that described device is applied in view networking, comprising:

Video flowing receiving module for participating in video conference with the identity of software terminal, and receives the view that view networked server is sent The video flowing of frequency meeting；

Face recognition module, for extracting the one-frame video data in the video flowing every setting time, to the frame video counts According to recognition of face is carried out, the identity information of spokesman is determined, and determine the time limit of speech of the spokesman；

Speech recognition module is converted to the audio data for the timestamp according to the audio data in the video flowing Text, using as speech content；

Meeting summary logging modle, for according to the time limit of speech of the spokesman and the timestamp of the audio data, by institute The identity information write-in document corresponding with speech content of spokesman is stated, to form meeting summary document.

7. device according to claim 6, which is characterized in that the face recognition module includes:

Video data extraction unit, for extracting the video data of a key frame in the video flowing every setting time.

8. device according to claim 6, which is characterized in that the face recognition module includes:

Face identification unit, for corresponding with personnel identity information with pre-stored face characteristic according to the face characteristic Relationship determines the identity information of spokesman.

9. device according to claim 6, which is characterized in that further include:

10. a kind of meeting summary of video conference records system, including view networked terminals, view networked server and conference dispatching system System, which is characterized in that further include artificial intelligence AI server, the AI server is for executing such as any one of claim 1-5 The meeting summary recording method of the video conference.