CN115225849A - Video conference system resource scheduling method based on cloud computing - Google Patents

Video conference system resource scheduling method based on cloud computing Download PDF

Info

Publication number
CN115225849A
CN115225849A CN202210839054.1A CN202210839054A CN115225849A CN 115225849 A CN115225849 A CN 115225849A CN 202210839054 A CN202210839054 A CN 202210839054A CN 115225849 A CN115225849 A CN 115225849A
Authority
CN
China
Prior art keywords
picture
conference
audio
video data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210839054.1A
Other languages
Chinese (zh)
Inventor
牛永伟
吴鑫坤
徐志鹏
周玉宏
张钱泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Spider Information Technology Co ltd
Original Assignee
Anhui Spider Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Spider Information Technology Co ltd filed Critical Anhui Spider Information Technology Co ltd
Priority to CN202210839054.1A priority Critical patent/CN115225849A/en
Publication of CN115225849A publication Critical patent/CN115225849A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/155Conference systems involving storage of or access to video conference sessions

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention is suitable for the technical field of data transmission, and particularly relates to a video conference system resource scheduling method based on cloud computing, which comprises the following steps: establishing data connection with the participant equipment, and receiving audio and video data uploaded by the participant equipment; generating a preliminary conference summary, randomly selecting a frame of picture, and generating a compiling time node; compiling the preliminary meeting summary into the picture to generate an identity verification picture; and receiving a compiling time node, audio and video data and an identity verification picture from external equipment, performing picture verification, judging whether the conference is synchronous or not, and if so, storing. In the invention, each party obtains the audio and video data to generate and transmit the conference summary, and whether the conference summary obtained by each party is the same or not is judged by comparing the pictures subjected to superposition processing, so that the condition that the conference summary is tampered can be timely found in the conference process, and the loss caused by the tampering of the conference summary in the follow-up process is avoided.

Description

Video conference system resource scheduling method based on cloud computing
Technical Field
The invention belongs to the technical field of data transmission, and particularly relates to a cloud computing-based video conference system resource scheduling method.
Background
Cloud computing is one of distributed computing, and means that a huge data computing processing program is decomposed into countless small programs through a network cloud, and then the small programs are processed and analyzed through a system consisting of a plurality of servers to obtain results and are returned to a user. In the early stage of cloud computing, simple distributed computing is adopted, task distribution is solved, and computing results are merged.
Video conferencing refers to a conference in which people at two or more locations have a face-to-face conversation via a communication device and a network. Video conferences can be divided into point-to-point conferences and multipoint conferences according to different numbers of participating places. By using the video conference system, the participants can hear the sound of other meeting places, see the image, the action and the expression of the participants in other meeting places, and also can send electronic demonstration contents, so that the participants have the feeling of being personally on the scene.
In the current video conference process, in order to record the conference process, voice or pictures of the video conference are usually recorded, and a conference summary is generated through voice recognition, but once contents such as the conference summary of the video conference are illegally stolen or tampered, participants cannot obtain real conference contents, which causes huge hidden danger.
Disclosure of Invention
An embodiment of the present invention is directed to provide a cloud computing-based video conference system resource scheduling method, and aims to solve the problems provided in the third part of the background art.
The embodiment of the invention is realized in such a way that a video conference system resource scheduling method based on cloud computing comprises the following steps:
establishing data connection with the participant equipment, and receiving audio and video data uploaded by the participant equipment, wherein the audio and video data comprises time axis data;
generating a preliminary conference summary according to the audio and video data, randomly selecting a frame of picture from the audio and video data, and generating a compiling time node, wherein the compiling time node is a time point corresponding to the picture;
compiling the preliminary meeting summary into the picture to generate an identity verification picture, and sending out a compiling time node, audio and video data and the identity verification picture;
receiving a compiling time node, audio and video data and an identity verification picture from external equipment, performing picture verification, judging whether the conference summary is synchronous, and if so, storing the preliminary conference summary.
Preferably, the step of generating a preliminary conference summary according to the audio/video data, randomly selecting a frame of picture from the audio/video data, and generating a compiling time node specifically includes:
performing data extraction on the audio and video data to obtain voice data and video data, and identifying a speaker according to the video data;
determining a speech device according to a speaker, extracting and identifying corresponding voice data in the speech device, and generating a preliminary conference summary;
randomly selecting a frame of picture from video data, determining the time corresponding to the picture, and obtaining a compiling time node.
Preferably, the step of compiling the preliminary conference summary into the picture to generate an identity verification picture, and sending out the compiling time node, the audio/video data, and the identity verification picture specifically includes:
respectively converting the data corresponding to the preliminary conference summary and the picture into binary summary data and binary picture data;
superposing the binary summary data and the binary picture data to obtain an identity verification picture;
and sending the compiling time node, the audio and video data and the identity verification picture.
Preferably, the steps of receiving the compiling time node, the audio/video data and the identity verification picture from the external device, performing picture verification, and judging whether the conference session is synchronous or not specifically include:
receiving a compiling time node, audio and video data and an identity verification picture from external equipment, and extracting a frame of picture from the audio and video data from the external equipment according to the compiling time node to obtain an active verification picture;
carrying out voice recognition on audio and video data from external equipment to obtain a secondary conference summary;
and fusing the secondary conference summary and the active verification picture in a binary system superposition mode to obtain a picture to be verified, comparing the picture to be verified with the identity verification picture, and judging whether the conference summary is synchronous or not.
Preferably, the compiling time node, the audio and video data and the identity verification picture are all transmitted in an encrypted mode.
Preferably, when the conference schedules are asynchronous, prompt information is sent out to inform all parties to check the conference records.
Preferably, when the conference is finished, a final conference summary is generated according to the preliminary conference summary, and audio and video data generated in the conference process are uploaded to a cloud for storage.
Preferably, the audio and video data stored in the cloud are deleted periodically.
Preferably, the final meeting summary is sent to each participant synchronously.
Another objective of an embodiment of the present invention is to provide a cloud computing-based video conference system resource scheduling system, where the system includes:
the device connection module is used for establishing data connection with the participant device and receiving audio and video data uploaded by the participant device, wherein the audio and video data comprises time axis data;
the summary generation module is used for generating a preliminary conference summary according to the audio and video data, randomly selecting a frame of picture from the audio and video data and generating a compiling time node, wherein the compiling time node is a time point corresponding to the picture;
the picture compiling module is used for compiling the preliminary meeting summary into the picture to generate an identity verification picture and sending out a compiling time node, audio and video data and the identity verification picture;
and the document checking module is used for receiving the compiling time node, the audio and video data and the identity checking picture from the external equipment, checking the picture, judging whether the conference document is synchronous or not, and storing the preliminary conference document if the conference document is synchronous.
Preferably, the summary generation module includes:
the data extraction unit is used for carrying out data extraction on the audio and video data to obtain voice data and video data and identifying a speaker according to the video data;
the voice recognition unit is used for determining a speaking device according to a speaker, extracting and recognizing corresponding voice data in the speaking device and generating a preliminary conference summary;
and the picture processing unit is used for randomly selecting a frame of picture from the video data, determining the corresponding time of the picture and obtaining a compiling time node.
Preferably, the screen compiling module includes:
the system comprises a binary conversion unit, a first conversion unit and a second conversion unit, wherein the binary conversion unit is used for converting the data corresponding to the elementary meeting summary and the picture into binary summary data and binary picture data respectively;
the data superposition unit is used for carrying out superposition processing on the binary summary data and the binary picture data to obtain an identity verification picture;
and the data sending unit is used for sending the compiling time node, the audio and video data and the identity verification picture.
Preferably, the summary verification module comprises:
the data receiving unit is used for receiving the compiling time node, the audio and video data and the identity verification picture from the external equipment, and extracting a frame of picture from the audio and video data from the external equipment according to the compiling time node to obtain an active verification picture;
the secondary conference recognition unit is used for carrying out voice recognition on audio and video data from external equipment to obtain a secondary conference summary;
and the active verification module is used for fusing the secondary conference summary and the active verification picture in a binary system superposition mode to obtain a picture to be verified, comparing the picture to be verified and the identity verification picture, and judging whether the conference summary is synchronous or not.
According to the video conference system resource scheduling method based on cloud computing, the generation of the conference summary and the picture representing the conference summary is locally completed by recording the audio and video of each party of conference staff, the audio and video data are transmitted to each party, each party obtains the audio and video data, the conference summary is generated and transmitted, the pictures subjected to superposition processing are compared, whether the conference summary obtained by each party is the same or not is judged, therefore, the condition that the conference summary is falsified can be timely found in the conference process, and the loss caused by the fact that the conference summary is falsified in the follow-up process is avoided.
Drawings
Fig. 1 is a flowchart of a resource scheduling method for a video conference system based on cloud computing according to an embodiment of the present invention;
fig. 2 is a flowchart of the steps of generating a preliminary conference summary according to audio/video data, randomly selecting a frame of picture from the audio/video data, and generating a compiling time node according to the audio/video data according to the embodiment of the present invention;
fig. 3 is a flowchart of steps of compiling a preliminary conference summary into the picture, generating an identity verification picture, and sending out a compiling time node, audio/video data, and the identity verification picture according to the embodiment of the present invention;
fig. 4 is a flowchart of steps of receiving a compile time node, audio/video data, and an identity verification screen from an external device, performing screen verification, and determining whether a conference summary is synchronous according to an embodiment of the present invention;
fig. 5 is an architecture diagram of a resource scheduling system of a cloud computing-based video conference system according to an embodiment of the present invention;
FIG. 6 is an architecture diagram of a summary generation module according to an embodiment of the present invention;
FIG. 7 is a block diagram of a frame compilation module according to an embodiment of the present invention;
fig. 8 is an architecture diagram of a summary verification module according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
Video conferencing refers to a conference in which people at two or more locations have a face-to-face conversation via a communication device and a network. Video conferences can be divided into point-to-point conferences and multipoint conferences according to different numbers of participating places. By using the video conference system, the participants can hear the sound of other meeting places, see the image, the action and the expression of the participants in other meeting places, and also can send electronic demonstration contents, so that the participants have the feeling of being personally on the scene. In the current video conference process, in order to record the conference process, voice or pictures of the video conference are usually recorded, and a conference summary is generated through voice recognition, but once contents such as the conference summary of the video conference are illegally stolen or tampered, participants cannot obtain real conference contents, which causes huge hidden danger.
In the invention, the generation of the conference summary and the picture representing the conference summary is locally finished by recording the audio and video of each party participating in the conference, and then the audio and video data is transmitted to each party, each party obtains the audio and video data to generate the conference summary and transmits the conference summary, and whether the conference summary obtained by each party is the same or not is judged by comparing the pictures subjected to superposition processing, so that the condition that the conference summary is falsified can be timely found in the conference process, and the loss caused by the falsification of the conference summary in the follow-up process is avoided.
As shown in fig. 1, a flowchart of a cloud computing-based video conference system resource scheduling method provided in an embodiment of the present invention is shown, where the method includes:
and S100, establishing data connection with the participant equipment, and receiving audio and video data uploaded by the participant equipment, wherein the audio and video data comprise time axis data.
In the step, data connection is established with participant equipment, an application scene of the invention is that video conferences are carried out for multiple parties, and each participant has multiple persons, such as companies A, B and C need to carry out video conferences, and three companies A, B and C respectively have multiple employees to participate in the conference, so that when the video conferences are carried out, the companies A, B and C respectively assign an employee as a representative, the representatives of the three companies establish network connection through respective equipment, and other employees of the companies A, B and C are connected with equipment used by the corresponding representatives of the companies, such as six employees of the company A, A1, A2, A3, A4, A5 and A6 participate in the conference, and A1 is designated as the representative of the company, so that the remaining five employees are connected with the equipment used by the company A through the equipment, in the video process, each participant utilizes the equipment in each hand to collect audio and video conference data, and the equipment corresponding to the participant equipment to collect the audio and video conference data, and the audio and video data collected by the equipment used by the representative of the company are applied to the representative equipment used by the representative of the company.
S200, generating a preliminary conference summary according to the audio and video data, randomly selecting a frame of picture from the audio and video data, and generating a compiling time node, wherein the compiling time node is a time point corresponding to the picture.
In the step, a preliminary conference summary is generated according to the audio and video data, and the speech information of each participant in each audio and video data is determined in a voice recognition mode.
S300, compiling the preliminary conference summary into the picture to generate an identity verification picture, and sending out a compiling time node, audio and video data and the identity verification picture.
In this step, the preliminary meeting summary is compiled into the picture, specifically, the picture data and the content of the preliminary meeting summary can be unified through a system conversion mode, then the two data are fused to finally generate an identity verification picture, a compiling time node, audio and video data and the identity verification picture are sent out, and specifically, each party represents and sends out the compiling time node, the audio and video data and the identity verification picture which are generated by each party to other parties; and the compiling time node, the audio and video data and the identity verification picture are all transmitted in an encrypted mode.
S400, receiving a compiling time node, audio and video data and an identity verification picture from external equipment, performing picture verification, judging whether the conference summary is synchronous, and if so, storing a preliminary conference summary.
In this step, a compiling time node, audio and video data and an identity verification picture from external equipment are received, and each party equipment judges whether conference summary obtained by each party is the same or not by comparing the identity verification pictures; when the conference schedules are asynchronous, sending prompt information to inform all parties to check the conference records; when the conference is finished, generating a final conference summary according to the preliminary conference summary, and uploading audio and video data generated in the conference process to a cloud for storage; the audio and video data stored in the cloud end are deleted periodically; and finally, synchronously sending the conference summary to each party participant.
As shown in fig. 2, as a preferred embodiment of the present invention, the step of generating a preliminary meeting summary according to audio/video data, randomly selecting a frame of picture from the audio/video data, and generating a compiling time node specifically includes:
s201, data extraction is carried out on the audio and video data to obtain voice data and video data, and a speaker is identified according to the video data.
In this step, data extraction is performed on audio and video data, audio tracks in the audio and video data are separated, picture data is separately stripped, voice data and video data are obtained, whether people in each video data speak is determined in a portrait recognition mode, if two participants participate in a conference at the same time, and because the distance between the two participants is short, two devices may record the voice when one of the two participants speaks at the same time, so that an inaccurate situation may be generated when a conference is generated.
S202, determining speaking equipment according to the speaker, extracting and identifying corresponding voice data in the speaking equipment, and generating a preliminary conference summary.
In the step, a speaking device is determined according to a speaker, the speaker and the speaking device are determined, voice recognition is completed through a voice recognition engine, and a conversational preliminary conference summary is generated according to the speaking sequence of the voice recognition engine so as to facilitate follow-up verification.
S203, randomly selecting a frame of picture from the video data, determining the corresponding time of the picture, and obtaining a compiling time node.
In this step, a frame of picture is randomly selected from the video data, specifically, a time may be generated by using a random function, and a picture corresponding to the time is determined according to the time, so that a coding time node can be obtained while obtaining the picture.
As shown in fig. 3, as a preferred embodiment of the present invention, the steps of compiling the preliminary conference summary into the picture, generating an identity verification picture, and sending out the compiling time node, the audio/video data, and the identity verification picture specifically include:
s301, converting the data corresponding to the preliminary conference summary and the picture into binary summary data and binary picture data respectively.
In the step, the binary system conversion is carried out, the preliminary meeting summary is represented by a binary system, the data corresponding to the same picture is also represented by the binary system, and the representation mode between the two is the same, so that the processing is convenient.
And S302, overlapping the binary summary data and the binary picture data to obtain an identity verification picture.
In this step, binary summary data and binary picture data are superimposed, and the data content of the binary summary data is less than that of the binary picture data, so that when the binary summary data is superimposed, the binary picture data is superimposed in binary digits, for example, the binary summary data is 1100, and the binary picture data is 100101010110, the binary picture data is divided into a plurality of binary character strings with the same length as the binary summary data, then the binary character strings are added, for example, 1001+1100 to obtain 10101, the characters exceeding the binary character strings are discarded to obtain 0101, and finally 011000100010 is obtained according to calculation, and the characters exceeding the binary character strings are converted to obtain the identity verification picture.
And S303, sending the compiling time node, the audio and video data and the identity verification picture.
In this step, the compiling time node, the audio and video data and the identity verification picture are sent out, that is, the representative device which sends out the compiling time node, the audio and video data and the identity verification picture is generated and sent to the representative devices of other parties for the verification of other devices.
As shown in fig. 4, as a preferred embodiment of the present invention, the steps of receiving a compiling time node, audio/video data, and an identity verification screen from an external device, performing screen verification, and determining whether a conference session is synchronous specifically include:
s401, receiving a compiling time node, audio and video data and an identity verification picture from external equipment, and extracting a frame of picture from the audio and video data from the external equipment according to the compiling time node to obtain an active verification picture.
In this step, a compiling time node, audio/video data and an identity verification picture from an external device are received, picture extraction is performed according to the compiling time node to obtain a frame of picture, and since the audio/video data is sent by a generator, the extracted picture should be the same as a picture corresponding to the corresponding moment of the sender if no tampering occurs.
S402, carrying out voice recognition on the audio and video data from the external equipment to obtain a secondary conference summary.
In the step, voice recognition is carried out on the audio and video data from the external equipment, and the same voice recognition engine is adopted for recognition, so that content deviation caused by recognition of different voice recognition engines is avoided, and a secondary conference summary is obtained.
And S403, fusing the secondary conference summary and the active verification picture in a binary system superposition mode to obtain a picture to be verified, comparing the picture to be verified with the identity verification picture, and judging whether the conference summary is synchronous or not.
In the step, the pictures are converted into binary systems in the same superposition mode, then superposition is carried out, then the pictures to be checked are obtained through conversion, then the pictures to be checked and the identity checking pictures are compared in a pixel comparison mode, and if the pictures to be checked and the identity checking pictures are the same, the contents of the meetings of all parties are consistent.
As shown in fig. 5, a cloud computing-based video conference system resource scheduling system provided in an embodiment of the present invention includes:
the device connection module 100 is configured to establish data connection with the participant device, and receive audio and video data uploaded by the participant device, where the audio and video data includes time axis data.
In the system, a device connection module 100 establishes data connection with participant devices, an application scenario of the system is that multiple parties carry out a video conference, and each participant has multiple persons, for example, companies a, B and C need to carry out the video conference, and companies a, B and C each have multiple employees to participate in the conference, then when the video conference is carried out, the companies a, B and C respectively assign an employee as a representative, the representatives of the three companies establish network connection through respective devices, and other employees of the companies a, B and C are connected with devices used by the representatives of the company to which the employees belong, for example, the company a has six employees of A1, A2, A3, A4, A5 and A6 to participate in the conference, A1 is designated as the representative of the company, then the remaining five employees are connected with the devices used by the company a through the devices, in the video process, each participant uses the devices in the respective hands to collect audio and video data to obtain audio and video conference data, and the devices corresponding to the participant devices of the company to collect the audio and transmit the audio and video data used by the company, and the devices of the company to represent the audio and the devices used by the device representatives of the company to carry out the data.
The summary generation module 200 is configured to generate a preliminary conference summary according to the audio/video data, randomly select a frame of picture from the audio/video data, and generate a compiling time node, where the compiling time node is a time point corresponding to the picture.
In the system, a summary generation module 200 generates a preliminary conference summary according to audio and video data, and determines the speaking information of each participant in each audio and video data in a voice recognition mode, because the source of the audio and video data is determined, the speaking content of each participant can be specifically determined, and according to time axis data, the time corresponding to the speaking content of each participant can be determined.
And the picture compiling module 300 is configured to compile the preliminary conference summary into the picture, generate an identity verification picture, and send out the compiling time node, the audio/video data, and the identity verification picture.
In the system, the picture compiling module 300 compiles the preliminary meeting summary into the picture, specifically, the picture data and the content of the preliminary meeting summary can be unified through a system conversion mode, then the two data are fused to finally generate an identity verification picture, a compiling time node, audio and video data and the identity verification picture are sent out, and specifically, each party represents and sends out the compiling time node, the audio and video data and the identity verification picture which are generated by each party to other parties; and the compiling time node, the audio and video data and the identity verification picture are all transmitted in an encrypted mode.
The summary check module 400 is configured to receive a compiling time node, audio/video data, and an identity check picture from an external device, perform picture check, determine whether a conference summary is synchronous, and store a preliminary conference summary if the conference summary is synchronous.
In the system, a document checking module 400 receives a compiling time node, audio and video data and an identity checking picture from external equipment, and equipment of each party judges whether conference documents obtained by each party are the same or not by comparing the identity checking pictures.
As shown in fig. 6, as a preferred embodiment of the present invention, the summary generation module 200 includes:
the data extraction unit 201 is configured to perform data extraction on the audio and video data to obtain voice data and video data, and identify a speaker according to the video data.
In this module, the data extraction unit 201 extracts audio and video data, separates audio tracks in the audio and video data, and strips picture data separately to obtain voice data and video data, and determines whether a person in each video data speaks in a portrait recognition manner, if two participants attend a meeting at the same time, because the distance between the two participants is short, two devices may record the voice of one of the two participants when speaking at the same time, and therefore, an inaccurate situation may occur when a meeting era is generated.
The voice recognition unit 202 is configured to determine a speaking device according to a speaker, extract and recognize corresponding voice data in the speaking device, and generate a preliminary conference summary.
In this module, the speech recognition unit 202 determines the speaking device from the speaker, determines the speaker and the speaking device, completes speech recognition by the speech recognition engine, and generates a conversational preliminary conference summary from the speaking sequence thereof for subsequent verification.
The picture processing unit 203 is configured to randomly select a frame of picture from the video data, determine a time corresponding to the frame, and obtain a coding time node.
In this module, the picture processing unit 203 randomly selects a frame of picture from the video data, specifically, a random function may be used to generate a time, and then the picture corresponding to the time is determined according to the time, so that the coding time node can be obtained while obtaining the picture.
As shown in fig. 7, as a preferred embodiment of the present invention, the screen compiling module 300 includes:
a binary conversion unit 301, configured to convert the data corresponding to the preliminary conference summary and the picture into binary summary data and binary picture data, respectively.
In this module, the binary conversion unit 301 performs binary conversion to represent the preliminary meeting summary in binary, and the data corresponding to the same picture is also represented in binary, and at this time, the representation modes between the two are the same, which is convenient for processing.
The data overlaying unit 302 is configured to perform an overlay process on the binary summary data and the binary picture data to obtain an identity verification picture.
In this module, the data superimposing unit 302 superimposes the binary summary data and the binary picture data, and the data content of the binary summary data is necessarily less than that of the binary picture data, so that when the superimposition is performed, the binary picture data is superimposed in binary digits, for example, the binary summary data is 1100, and the binary picture data is 100101010110, the binary picture data is divided into a plurality of binary character strings with the same length as the binary summary data, then the addition is performed, for example, 1001+1100, to obtain 10101, the characters exceeding the length are discarded, to obtain 0101, and finally the identity verification picture is obtained by calculating 011000100010 and converting the characters to obtain the identity verification picture.
And the data sending unit 303 is configured to send out the compiling time node, the audio/video data, and the identity verification picture.
In this module, the data sending unit 303 sends out the compiling time node, the audio/video data, and the identity verification picture, that is, generates the above representative device that sends out the compiling time node, the audio/video data, and the identity verification picture, and sends it to the representative devices of other parties for the other devices to perform verification.
As shown in fig. 8, as a preferred embodiment of the present invention, the summary verification module 400 includes:
the data receiving unit 401 is configured to receive a compiling time node, audio/video data, and an identity verification picture from an external device, and extract a frame of picture from the audio/video data from the external device according to the compiling time node to obtain an active verification picture.
In this module, the data receiving unit 401 receives the compile time node, the audio/video data, and the identity verification picture from the external device, and performs picture extraction according to the compile time node to obtain a frame of picture.
And a diphone recognition unit 402, configured to perform voice recognition on the audio/video data from the external device to obtain a secondary conference summary.
In this module, the diphone recognition unit 402 performs speech recognition on audio/video data from an external device, and recognizes with the same speech recognition engine to avoid content deviation caused by recognition with different speech recognition engines, thereby obtaining a secondary conference summary.
And the active verification module 403 is configured to fuse the secondary conference summary and the active verification picture in a binary system superposition manner to obtain a picture to be verified, compare the picture to be verified and the identity verification picture, and determine whether the conference summary is synchronous.
In this module, the active verification module 403 converts the images into binary systems in the same superposition manner, and then superposes the binary systems, and then obtains the images to be verified through the transformation, and then compares the images to be verified and the identity verification images in a pixel comparison manner, and if the images are the same, the contents of the conferences of all parties are consistent.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A video conference system resource scheduling method based on cloud computing is characterized by comprising the following steps:
establishing data connection with the participant equipment, and receiving audio and video data uploaded by the participant equipment, wherein the audio and video data comprises time axis data;
generating a preliminary conference summary according to the audio and video data, randomly selecting a frame of picture from the audio and video data, and generating a compiling time node, wherein the compiling time node is a time point corresponding to the picture;
compiling the preliminary conference summary into the picture to generate an identity verification picture, and sending out a compiling time node, audio and video data and the identity verification picture;
receiving a compiling time node, audio and video data and an identity verification picture from external equipment, performing picture verification, judging whether the conference summary is synchronous or not, and storing a preliminary conference summary if the conference summary is synchronous.
2. The cloud-computing-based video conference system resource scheduling method of claim 1, wherein the steps of generating a preliminary conference summary according to audio/video data, randomly selecting a frame of picture from the audio/video data, and generating a compiling time node specifically include:
performing data extraction on the audio and video data to obtain voice data and video data, and identifying a speaker according to the video data;
determining speech equipment according to a speaker, extracting and identifying corresponding voice data in the speech equipment, and generating a preliminary conference summary;
randomly selecting a frame of picture from the video data, determining the corresponding time of the picture, and obtaining a compiling time node.
3. The cloud-computing-based video conference system resource scheduling method according to claim 1, wherein the steps of compiling the preliminary conference summary into the picture, generating an identity verification picture, and sending out the compiling time node, the audio/video data, and the identity verification picture specifically include:
respectively converting the data corresponding to the preliminary conference summary and the pictures into binary summary data and binary picture data;
superposing the binary summary data and the binary picture data to obtain an identity verification picture;
and sending the compiling time node, the audio and video data and the identity verification picture.
4. The cloud-computing-based video conference system resource scheduling method according to claim 1, wherein the steps of receiving a compile time node, audio and video data, and an identity verification screen from an external device, performing screen verification, and determining whether a conference session is synchronized specifically include:
receiving a compiling time node, audio and video data and an identity verification picture from external equipment, and extracting a frame of picture from the audio and video data from the external equipment according to the compiling time node to obtain an active verification picture;
carrying out voice recognition on audio and video data from external equipment to obtain a secondary conference summary;
and fusing the secondary conference summary and the active verification picture in a binary system superposition mode to obtain a picture to be verified, comparing the picture to be verified with the identity verification picture, and judging whether the conference summary is synchronous or not.
5. The cloud-computing-based video conference system resource scheduling method of claim 1, wherein the compile time node, the audio and video data, and the identity verification picture are transmitted in an encrypted manner.
6. The cloud-computing-based video conference system resource scheduling method of claim 1, wherein when the conference is not synchronized, a prompt message is sent to notify each party to check the conference record.
7. The cloud-computing-based video conference system resource scheduling method of claim 1, wherein when a conference is finished, a final conference summary is generated according to a preliminary conference summary, and audio and video data generated in the conference process are uploaded to a cloud for storage.
8. The cloud-computing-based video conference system resource scheduling method of claim 7, wherein the audio and video data stored in the cloud is periodically deleted.
9. The cloud-computing-based video conferencing system resource scheduling method of claim 7, wherein the final conference summary is synchronously sent to each participant.
CN202210839054.1A 2022-07-18 2022-07-18 Video conference system resource scheduling method based on cloud computing Pending CN115225849A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210839054.1A CN115225849A (en) 2022-07-18 2022-07-18 Video conference system resource scheduling method based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210839054.1A CN115225849A (en) 2022-07-18 2022-07-18 Video conference system resource scheduling method based on cloud computing

Publications (1)

Publication Number Publication Date
CN115225849A true CN115225849A (en) 2022-10-21

Family

ID=83611909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210839054.1A Pending CN115225849A (en) 2022-07-18 2022-07-18 Video conference system resource scheduling method based on cloud computing

Country Status (1)

Country Link
CN (1) CN115225849A (en)

Similar Documents

Publication Publication Date Title
JP4458888B2 (en) Conference support system, minutes generation method, and computer program
CN111986677A (en) Conference summary generation method and device, computer equipment and storage medium
CN111858892B (en) Voice interaction method, device, equipment and medium based on knowledge graph
CN112200697B (en) Remote video room watching method, device, equipment and computer storage medium
WO2022193910A1 (en) Data processing method, apparatus and system, and electronic device and readable storage medium
CN111599359A (en) Man-machine interaction method, server, client and storage medium
WO2024032159A1 (en) Speaking object detection in multi-human-machine interaction scenario
CN110290345B (en) Cross-level conference roll-call speaking method and device, computer equipment and storage medium
CN113873088B (en) Interactive method and device for voice call, computer equipment and storage medium
WO2021159734A1 (en) Data processing method and apparatus, device, and medium
CN115225849A (en) Video conference system resource scheduling method based on cloud computing
US11798126B2 (en) Neural network identification of objects in 360-degree images
KR20190029999A (en) System for generating documents of minutes by using multi-connection and the method thereof
CN111770301B (en) Video conference data processing method and device
CN110996036B (en) Remote online conference management system based on AI intelligent technology
CN115438888A (en) Interactive service quality detection method, electronic device and computer storage medium
CN113420133A (en) Session processing method, device, equipment and storage medium
CN114764690A (en) Method, device and system for intelligently conducting conference summary
CN113517002A (en) Information processing method, device and system, conference terminal and server
CN112383737A (en) Multi-user online content same-screen video processing verification method and device and electronic equipment
CN113536257B (en) Multi-party conference admission method and system based on block chain
CN114499827B (en) Label evaluation method and device and electronic equipment
US20230328203A1 (en) Connection method between different video conference platforms and connecting device executing the method
KR102391589B1 (en) Mediation system for interpretation and mediation method for the same
JP7110669B2 (en) Video conferencing system, video conferencing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination