CN111193890B - Conference record analyzing device and method and conference record playing system - Google Patents

Conference record analyzing device and method and conference record playing system Download PDF

Info

Publication number
CN111193890B
CN111193890B CN201811353598.7A CN201811353598A CN111193890B CN 111193890 B CN111193890 B CN 111193890B CN 201811353598 A CN201811353598 A CN 201811353598A CN 111193890 B CN111193890 B CN 111193890B
Authority
CN
China
Prior art keywords
image
face
speech
speaking
conference record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811353598.7A
Other languages
Chinese (zh)
Other versions
CN111193890A (en
Inventor
曹永刚
周文
顾炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to CN201811353598.7A priority Critical patent/CN111193890B/en
Publication of CN111193890A publication Critical patent/CN111193890A/en
Application granted granted Critical
Publication of CN111193890B publication Critical patent/CN111193890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/183Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a single remote source

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a conference record playing system, which is provided with a conference record analyzing device and is used for processing the conference record; and the conference record playing system is used for playing the processed conference record so as to enable the watching users to watch the conference record. The conference recording and analyzing device is provided with a voice recognition part for translating the recorded conference text to generate the speaking words of each participant, so that text information can be formed to facilitate the viewing of the content of the speaking sound of a user; the text correspondence processing part is also provided for corresponding the speech texts with the speaking participants, so that the viewing user can know the speaking participants when viewing the speech texts. The conference recording/playing device has a playing control section for controlling the image playing section and the sound playing section to skip the current playing time of the conference recording to the clicked speech character speaking time when the user clicks the speech character.

Description

Conference record analyzing device and method and conference record playing system
Technical Field
The invention relates to a conference record analysis device, a corresponding conference record analysis method and a conference record playing system comprising the conference record analysis device.
Background
The conference record is used for recording the information of the discussion of the conference participants in the conference, providing accurate basis for the conference information, and simultaneously, the corresponding personnel can review the conference information, thereby avoiding the loss or forgetting of the discussion content in the conference.
In the past, a common conference recording method is manual conference recording, that is, recording information of a conference by using characters by a bookkeeper.
The above method has the disadvantages that the detail and intuition are not enough, and important information such as writing on a blackboard and a projection screen in the conference is easy to be lost, so that the conference recording is usually difficult to restore the information of the whole conference.
In order to overcome the above disadvantages and record conference information more vividly, a conference recording mode through video is presented in the prior art, that is, conference information is stored by recording speaking sound and speaking image of participants in a conference. However, in this way, when a user watching the video records needs to view specific content (for example, the speaking content of a specific person, or the speaking content of a participant for a specific subject), the user needs to browse the whole video to find the required content, which results in a great deal of time and effort for the watching user and reduces the work efficiency.
Disclosure of Invention
The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide a conference recording and analyzing apparatus that can recognize speech information of participants in an image by text without requiring a user to browse the entire image.
In order to achieve the purpose, the invention adopts the following technical scheme:
the present invention provides a conference record analysis device for analyzing a conference record including a panoramic image and a speech sound recorded by panoramic imaging to acquire a speech record, the conference record analysis device including: the conference recording storage part stores conference records, the face recognition part analyzes panoramic images in the conference records to obtain different face characteristics of each participant and recognize the face of each participant, the voice conversion part converts speech sounds into corresponding speech words according to time, the word correspondence processing part corresponds the speech words to the corresponding participants according to the speech time of the participants, and the speech record storage part correspondingly stores the speech words, the speech time and the corresponding participants.
The present invention also provides a conference record analysis method for analyzing a conference record which is recorded by panoramic imaging and contains a panoramic image and a speech sound to acquire a speech record, the method comprising the steps of: a conference record storage step of storing the conference record; a face recognition step, namely analyzing the panoramic image in the conference record to obtain different face characteristics of each participant and recognizing the face of each participant; a voice conversion step of converting the speech sound into corresponding speech words according to time; a text corresponding processing step, wherein the speaking texts are corresponding to the corresponding participants according to the speaking time of the participants; and a speech record storage step, in which the speech words, the speech time and the corresponding participants are correspondingly stored.
Action and effects of the invention
According to the conference recording analysis device and method, the recorded conference text is translated by the voice recognition part, and the speaking characters and speaking time of each participant are generated, so that text information can be formed to facilitate the viewing of the content of speaking sound of a user; the conference recording processing system comprises a conference recording processing part, a text corresponding processing part and a text corresponding processing part, wherein the conference recording processing part is used for corresponding the speech text with the speech of the participants, so that the faces of the participants in the conference recording are corresponding to the speech of the participants in the conference recording, and therefore, in the processed conference recording, when a watching user watches the conference recording image and the speech sound, all text information and the corresponding speakers in the conference recording can be directly browsed, and the watching user can conveniently and quickly know and inquire the whole conference recording information; meanwhile, the conference scene can be restored to a greater extent by using the recorded image recorded by the panoramic camera.
Drawings
Fig. 1 is a schematic diagram of a configuration of a conference record playing system in an embodiment of the present invention;
fig. 2 is a schematic configuration diagram of a conference record analysis apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a word correspondence processing unit according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a configuration of a conference recording and playing apparatus according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating recording a playback screen according to an embodiment of the present invention;
FIG. 6 is a flow diagram of a meeting record parsing process in an embodiment of the invention; and
fig. 7 is a flowchart of an automatic adjustment process of a playing angle of view according to an embodiment of the present invention.
Detailed Description
In order to make the technical means, creation features, achievement objects and effects of the present invention easy to understand, the following describes the conference recording and playing system of the present invention in detail with reference to the embodiments and the accompanying drawings.
The present invention provides a conference record analysis device for analyzing a conference record which is recorded by panoramic imaging and contains a panoramic image and a speech sound to acquire a speech record, the conference record analysis device comprising: the conference recording storage part stores conference records, the face recognition part analyzes panoramic images in the conference records to acquire different face characteristics of each participant and recognize the face of each participant, the voice conversion part converts speech sounds into corresponding speech words according to time, the word correspondence processing part corresponds the speech words to the corresponding participants according to the speech time of the participants, and the speech record storage part correspondingly stores the speech words, the speech time and the corresponding participants.
As a first aspect, the present invention provides a conference record analysis device, further comprising: the device comprises a face recognition part and a face storage part, wherein the face recognition part sets a corresponding number of pieces of recognition information according to the number of the participants recognized by the face recognition part and respectively gives the recognition information to each participant, the face storage part correspondingly stores the recognition information and the faces of the participants, and the comment recording storage part correspondingly stores the comment characters, the comment time and the corresponding recognition information of the participants.
In the first aspect, the present invention may further include: the organization personnel management database at least stores identification information and corresponding face images of all organization personnel, the face retrieval part retrieves the organization personnel management database according to faces of the participants identified by the face identification part to obtain the identification information of all the participants, and the speech record storage part correspondingly stores speech words, speech time and corresponding identification information of the participants.
In the first aspect, the present invention may further include: wherein, the word correspondence processing part comprises: the voice recognition unit analyzes the speech sound in the conference record to obtain the voiceprint characteristics of each participant, so that the voiceprints of different participants are recognized; the voiceprint storage unit is used for correspondingly storing the voiceprint and the corresponding participant; a speech time division unit which divides each speech in speech sound into different speech sound parts according to speech pauses in the speech characters; and the identification corresponding unit is used for sequentially identifying the voiceprints in the speech sound parts according to the voiceprint storage unit and judging the corresponding participants from the voiceprint storage unit according to the speech voiceprints.
In the first aspect, the present invention may further have the following features: wherein, the word corresponds the processing unit and includes: a speech image part dividing unit which divides each speech in the panoramic image into different speech image parts according to speech pauses in the speech texts; the face recognition unit in the image part intercepts the face image in the speech image part and recognizes each face as the face in the image part; a mouth shape conversion judging unit which respectively judges mouth shape conversion of the human face in each image part in the speaking image part so as to obtain the mouth shape conversion times of the participant corresponding to the human face in each image part in the speaking time; a speaker determination unit that counts the number of times of mouth shape conversion of the face in each of the image portions of the speech image portion and determines the face in the image portion having the largest number of times of mouth shape conversion as the speech face in the speech image portion; and the recognition corresponding unit is used for sequentially judging the participants corresponding to each speaking time according to the speaking faces in each speaking image part.
As a second aspect, the present invention provides a conference recording playback system for processing a conference recording that is recorded by panoramic imaging and that includes a panoramic image and a speech sound, and playing back the processed conference recording, the system comprising: the conference record analysis device is used for processing the conference record; the conference record analysis device is provided with a conference record storage part, a face recognition part, a voice conversion part, a character corresponding processing part and a speaking record storage part, wherein the conference record storage part stores conference records, the face recognition part analyzes panoramic images in the conference records to acquire different face characteristics of each participant and recognize the face of each participant, the voice conversion part converts speaking sounds into corresponding speaking characters according to time, the character corresponding processing part corresponds the speaking characters to the corresponding participants according to speaking time of the participants, and the speaking record storage part correspondingly stores the speaking characters, speaking time and the corresponding participants; the conference recording and playing device comprises a picture storage part, a playing control part, an input display part, an image playing part and a sound playing part, wherein the picture storage part stores a recording and playing picture, the record playing picture has an image display part and a character display part, the playing control part controls the input display part to display the record playing picture and controls the image playing part to play the panoramic image in the image display part, and controlling the sound playing part to play the speech sound, further displaying the speech characters and the speech time in the character display part according to the speech time so as to enable a watching user to click the speech characters, once the watching user clicks the speech characters, the playing control part controls the panoramic image and the speech sound played by the image playing part and the sound playing part according to the speech time corresponding to the clicked speech characters, and accordingly the current playing time of the panoramic image and the speech sound corresponds to the speech time.
In the second embodiment, the following features may be provided: the conference record playing device is further provided with a time progress retrieval part and a face coordinate acquisition part, the image playing part is a panoramic image playing part for playing panoramic images according to different viewing angles, the playing control part controls the time progress retrieval part to retrieve speaking time in the speaking record storage part and corresponding attendees according to the current playing time according to preset refreshing time, and controls the face coordinate acquisition part to analyze the panoramic images in the conference records according to the attendees to acquire coordinate information corresponding to the faces of the attendees, and further controls the panoramic image playing part to adjust the playing view angle center in the panoramic images to a view angle taking the coordinate information as the center according to the coordinate information corresponding to the speaking faces.
The invention provides a conference record analysis method, which is used for analyzing a conference record which is recorded by panoramic camera shooting and contains a panoramic image and a speech sound so as to obtain a speech record, and is characterized by comprising the following steps: a conference record storage step of storing the conference record; a face recognition step, namely analyzing the panoramic image in the conference record to obtain different face characteristics of each participant and recognizing the face of each participant; a voice conversion step of converting the speech sound into corresponding speech words according to time; a word corresponding processing step, wherein the speaking words are corresponding to the corresponding participants according to the speaking time of the participants; and a speech record storage step, in which the speech words, the speech time and the corresponding participants are correspondingly stored.
< example >
Fig. 1 is a schematic diagram of a configuration of a conference recording and playing system in an embodiment of the present invention.
As shown in fig. 1, the conference record playback system 100 of the present embodiment includes a conference record analysis device 1, a conference record playback device 2, and a communication network 3.
The conference record analysis apparatus 1 and the conference record playback apparatus 2 are communicatively connected via a communication network 3. The conference record analysis device 1 is configured to process a conference record, and the conference record playing device 2 is configured to play the conference record processed by the conference record analysis device 1.
Fig. 2 is a schematic configuration diagram of a conference record analysis apparatus in an embodiment of the present invention.
As shown in fig. 2, the conference record analysis device 1 includes a conference record storage unit 11, a face recognition unit 12, a facility staff member management database 13, a face search unit 14, a voice conversion unit 15, a word correspondence processing unit 16, an utterance record storage unit 17, an analysis-side communication unit 18, and an analysis-side control unit 19.
Among these, the analysis-side communication unit 18 exchanges data between the respective components of the conference record analysis device 1 and between the conference record analysis device 1 and another device, and the analysis-side control unit 19 controls the operation of the respective components of the conference record analysis device 1.
The conference record storage unit 11 is used to store a conference record including a panoramic image and a speech sound recorded by a panoramic recording device (for example, a panoramic camera). The panoramic image is a conference video recording which has a 360-degree panoramic view field and can be subjected to plane processing and is developed through a panoramic video development algorithm (for example, three. js algorithm in the prior art is used for developing the panoramic video in a three-dimensional scene in a browser), the speaking sound is audio recorded with speaking sound of a conference of participants, and the time axes of the panoramic image and the speaking sound correspond to each other.
The face recognition unit 12 is configured to analyze the panoramic image in the meeting record to obtain different face features of each participant, and recognize a face of each participant.
In this embodiment, the face recognition unit 12 obtains each face feature and recognizes a face in the panoramic image according to the face feature, thereby confirming each different participant. In other embodiments, the face recognition unit 12 may also record the coordinate range of the face so as to cooperate with face feature recognition to identify and confirm the participants sitting at different positions.
The organization personnel management database 13 stores at least identification information of each organization personnel and corresponding face images. In this embodiment, the identification information is the employee number of the organization personnel, the organization personnel management database 13 further stores names (corresponding to the employee number, respectively) of the organization personnel, and the face image is obtained by acquiring the face image of each organization personnel.
The face retrieval unit 14 retrieves the organization personnel management database 13 from the faces of the participants identified by the face identification unit 12 to obtain identification information of each participant. The speech conversion section 15 converts speech sound into corresponding speech characters according to time.
In this embodiment, the speech conversion unit 15 first acquires the speech sound (i.e., the audio portion in the conference recording) from the conference recording storage unit 11, then segments the speech sound according to the speech time and the speech pause in the speech sound (e.g., the pause between sentences spoken by participants in the audio portion), records the start time and the end time of each speech segment as the speech time, and further converts the speech in each speech segment into speech words by speech conversion.
In another embodiment, the speech conversion unit 15 may also be able to generate speech words by calling an external speech conversion resource (e.g., some open-source speech conversion website resource) by the analysis-side communication unit 18 to convert speech in each speech segment.
The word correspondence processing unit 16 corresponds the utterance word to the participant according to the utterance time of the participant.
FIG. 3 is a block diagram of a word correspondence processing unit according to an embodiment of the present invention.
As shown in fig. 3, the character correspondence processing unit 16 is a processing unit that performs correspondence between uttered characters and uttered participants based on a panoramic image, and includes utterance image portion dividing means 161, in-image-portion face recognition means 162, mouth shape conversion determination means 163, utterer determination means 164, recognition correspondence means 165, and correspondence control means 166.
The correspondence control unit 166 controls the operation of each component of the character correspondence processing unit 16.
The utterance image division unit 161 is configured to divide each utterance in the panoramic image into different utterance image portions according to utterance pauses in the utterance text, where each utterance image portion has time information (i.e., start and stop times) and a panoramic image including faces of participants.
The in-image-section face recognition unit 162 cuts out the face images in the utterance image section divided by the utterance image section dividing unit 161 and recognizes each face as a face in the image section. In this embodiment, the in-image-section face recognition unit 162 collects image frames in a speech image section according to the time information of the speech image section and a predetermined interval sampling time, cuts the faces of the image frames, and divides the same faces in all the image frames into one group, so as to obtain a plurality of groups of face image frames corresponding to the participants respectively. In addition, a single human face image frame in each group of human face image frames also comprises interception time point information of the corresponding image frame.
The mouth shape conversion determination unit 163 determines the mouth shape conversion of each group of face image frames in the utterance image section, respectively, so as to obtain the number of mouth shape conversion times of the participant within the utterance time for the participant corresponding to each group of face image frames.
In this embodiment, the mouth shape transformation determining unit 163 sequentially analyzes feature points representing upper and lower lips in each face image frame in each group of face image frames by a face feature detection algorithm (for example, a Dlib algorithm in the prior art). And if the up-down movement distance of the upper lip feature point and the lower lip feature point of two consecutive front and back human face image frames in a group of human face image frames exceeds 1/2 of the lips of the participant, judging the number of times of mouth opening once. Then, counting the times of all mouths in each group of face image frames in sequence, thereby obtaining the times of mouth shape transformation of each group of face image frames.
The speaker determination unit 164 counts the number of times of mouth shape conversion of each group of face image frames in each speech image portion, and determines the face corresponding to the group of face image frames with the largest number of times of mouth shape conversion as the speech face in the speech image portion.
The recognition correspondence unit 165 sequentially determines participants corresponding to the respective speaking times based on the faces of the speakers determined by the speaker determination unit 164.
The utterance record storage unit 17 stores the utterance characters, the utterance time, and the identification information of the relevant participant in association with each other.
Fig. 4 is a block diagram of a structure of a conference recording and playing apparatus in an embodiment of the present invention.
As shown in fig. 4, the conference record playing apparatus 2 includes a screen storage unit 21, a playing control unit 22, an input display unit 23, an image playing unit 24, a sound playing unit 25, a schedule searching unit 26, a face coordinate acquiring unit 27, a playing side communication unit 28, and a playing side control unit 29.
The broadcast-side communication unit 28 exchanges data between the components of the conference record playback device 2 and between the conference record playback device 2 and another device, and the broadcast-side control unit 29 controls the operations of the components of the conference record playback device 2.
The screen storage unit 21 stores a recording/playback screen. As shown in fig. 5, the record playback screen has an image display portion and a text display portion for displaying a panoramic image of a conference record and all the speech texts in the conference record when a viewing user selects a conference record to view.
The playing control unit 22 is used for controlling the operations of the components related to the recording and playing process in the conference recording and playing device 2, including the operations related to the recording and playing process of the input display unit 23, the image playing unit 24, the sound playing unit 25, the time schedule retrieving unit 26, and the face coordinate obtaining unit 27.
Specifically, when the viewing user selects one conference record to play, the playback control section 22 controls the input display section 23 to display a record playback screen, controls the image playback section 24 to play a panoramic image in the image display section, controls the sound playback section 25 to play a speech sound in synchronization with the played panoramic image, and further displays speech characters and a speech time in the character display section according to the speech time to allow the viewing user to click the speech characters.
In this embodiment, the text display portion is a scrollable text box, and all the spoken texts recorded in the currently played conference are displayed in the text display portion and are correspondingly scrolled along with the recorded playing time, so that the spoken texts at the current time are always located in the middle of the text display portion. The viewing user can browse the speech letters by dragging the scroll bar in the letter display section through the input display section 23.
The text display section displays the speech text, and displays the corresponding speech time, the face image of the speaking participant (acquired from the organization personnel management database 13), and the name of the speaking participant (acquired from the organization personnel management database 13) near each speech text.
When the viewing user clicks a certain utterance text, the playback control unit 22 controls the playback progress of the panoramic image and the utterance sound played by the image playback unit 24 and the sound playback unit 25 based on the utterance time corresponding to the clicked utterance text, so that the current playback time of the panoramic image and the utterance sound is shifted to the clicked utterance text speaking time.
In the present embodiment, the image playback section 24 is a panoramic image playback section that plays a panoramic image according to different viewing angles, and the panoramic image playback section is capable of displaying a partial image of the panoramic image according to a playback viewing angle in the image display section.
When the image playback section 24 plays back the panoramic image, the playback control section 22 controls the schedule retrieval section 26 to retrieve the utterance time and the corresponding participant in the utterance record storage section 17 as an utterance participant according to the preset refresh time.
When the schedule retrieval unit 26 retrieves and acquires the speaking participant, the playback control unit 22 controls the face coordinate acquisition unit 27 to analyze the currently played panoramic image based on the acquired speaking participant, thereby acquiring coordinate information of the face of the speaking participant.
When the face coordinate acquiring unit 27 acquires the coordinate information, the playback control unit 22 controls the panoramic image playback unit to adjust the center of the playback angle in the panoramic image to the playback angle centered on the coordinate information, based on the coordinate information corresponding to the face of the speaker.
In this embodiment, the play control unit 22 has a timing unit, and is configured to remind the preset refresh time, where the preset refresh time is 0.5 s. When the image playing part 24 plays the panoramic image, the playing control part 22 controls the panoramic image playing part to adjust the playing angle of view every 0.5s, so that the speaking participant is always positioned at the center of the image display part in the recording and playing picture.
In other embodiments, the viewing user can adjust the playing angle of the panoramic image played by the panoramic image playing unit through the input display unit 23 (for example, by dragging with a mouse, the playing angle of the panoramic image is directly adjusted to the angle corresponding to the projection screen or the conference book to be viewed).
Fig. 6 is a flowchart of a process of parsing a conference record in an embodiment of the present invention.
As shown in fig. 6, the conference record analysis process is a process in which the conference record analysis device 1 analyzes a conference record.
When a new conference record is stored in the conference record storage unit 11, the conference record analysis device 1 analyzes the conference record, and then starts the following steps:
step S1-1, the face recognition part 12 obtains the panoramic image stored in the conference record storage part 11, analyzes the panoramic image to recognize different participants according to the face characteristics of different faces, and then enters step S1-2;
step S1-2, the voice converting part 15 obtains the speaking voice stored in the conference recording storing part 11, converts the speaking voice into speaking characters according to time and records the corresponding speaking time, and then the step S1-3 is proceeded;
step S1-3, the word correspondence processing part corresponds the speaking word converted in the step S1-2 to the participant according to the corresponding speaking time, and then the step S1-4 is carried out;
step S1-4, the face retrieval part 14 retrieves the organization personnel management database 13 according to the face identified in the step S1-1 to obtain the identification characteristics of the participant, and then the step S1-5 is carried out;
and step S1-5, correspondingly storing the speaking characters recognized in the step S1-2, the corresponding speaking time and the recognition characteristics of the participant acquired in the step S1-4, and ending the step.
Fig. 7 is a flowchart of an automatic adjustment process of a playing angle of view according to an embodiment of the present invention.
As shown in fig. 7, in the process of playing the conference record by the conference record playing apparatus 2, the playing control unit 22 controls the panoramic image playing unit to adjust the playing angle according to the preset refresh time, and the steps are as follows:
step S2-1, the timing unit of the playing control part 22 detects the refresh of the preset refresh time, and then the step S2-2 is proceeded;
step S2-2, the play control section 22 controls the schedule retrieval section 26 to retrieve the utterance time and the corresponding participant in the utterance record storage section 17 according to the current play time, and then proceeds to step S2-3;
step S2-3, the playback control unit 22 controls the face coordinate acquisition unit 27 to analyze the panoramic image in the conference record according to the participant acquired in step S2-2 to acquire coordinate information corresponding to the face of the participant, and then proceeds to step S2-4;
in step S2-4, the playback control unit 22 controls the panoramic image playback unit to adjust the center of the playback angle of view in the panoramic image to the playback angle of view centered on the coordinate information based on the coordinate information acquired in step S2-3, and the process ends.
Examples effects and effects
According to the conference recording analysis device and method provided by the embodiment, the recorded conference text is translated by the voice recognition part, and the speaking characters and speaking time of each participant are generated, so that text information can be formed so as to facilitate the viewing of the content of speaking sound of a user; the conference record analysis device can realize automatic arrangement of the panoramic video conference record, enables a watching user to directly browse all text information and corresponding speakers in the conference record without manual arrangement, and facilitates the watching user to quickly browse and screen the content of the conference record; meanwhile, the conference scene can be restored to a greater extent by using the recorded image recorded by the panoramic camera, so that the watching users have more telepresence when watching.
In the embodiment, when the organization personnel management database is provided, the human face searching part searches the database to obtain the identification information of the participator, displays the name of the speaker on the side surface when displaying the speech characters for the person watching the video to identify,
in an embodiment, the word correspondence processing unit can judge the face characteristics of the participants to correspond the speech words to the speakers, the mouth shape conversion judgment unit counts the number of times of mouth shape conversion of all the participants in each speech time period, so that the speaker judgment unit judges the speech face with the largest speech in each speech time period, and the recognition correspondence unit corresponds the participants to the speech words and the speech time period according to the speech face and the corresponding time period.
In the conference record playing system provided by this embodiment, the conference record playing device is provided to play the conference record processed by the conference record analyzing device, and the record playing screen has the text display portion, so that all the speech texts of the current conference record can be displayed, and therefore, a user can intuitively know the information in the whole conference record through the speech texts. Meanwhile, the playing control part can control the playing progress of the video to jump according to the words of speaking clicked by the watching user, so that the user can find the corresponding image and audio contents through the words, and the browsing and screening of the conference records can be completed more conveniently and rapidly.
In the embodiment, the playing control part controls the playing visual angle of the panoramic image at regular time, the coordinate information corresponding to the faces of the participants is acquired according to the face coordinate acquisition part, and the playing control part controls the panoramic image playing part to adjust the center of the playing visual angle in the panoramic image into the visual angle taking the coordinate information as the center, so that the visual angle can be adjusted to the speaking participants in real time in the recorded playing picture, the watching users can visually see which participants are speaking, the watching users have more presence, and better conference recording and watching effects are obtained.
< modification example I >
In comparison with the embodiment, the conference record analysis device 1 according to the first modification does not include the organization personnel management database 13 and the face search unit 14, but includes one identification information providing unit and one face storage unit. In this case, the conference record analysis device 1 can authenticate the identity information of the participant by the identification information adding unit and the face storage unit, and the procedure is roughly as follows.
The identification information assigning unit sets identification information of a corresponding number based on the number of the participants recognized by the face recognition unit 12 and assigns the identification information to each of the participants.
The face storage unit stores the identification information set by the identification information giving unit and the faces of the participants in correspondence.
The utterance record storage unit 17 stores the utterance characters, the utterance time, and the identification information of the corresponding participant set by the identification information providing unit in association with each other.
When the playback control unit 22 controls the text display unit to display the speech text, the text display unit displays the speech text, the speech time corresponding to the speech text, and the face image of the speaking participant (acquired from the face storage unit).
In the first modification, since the information identification assigning unit assigns identification information to each participant according to the number of participants, temporary identification information is added to each participant without knowing the information of the participants, the face stored in the face storage unit can be used as a head image for identifying the participants, and a person viewing a video is displayed on the side of the face storage unit when displaying the speech character of the conference record to identify the identity of the person viewing the video.
< modification example two >
In comparison with the embodiment, the character association processing unit 16 according to the second modification is a processing unit that associates the utterance characters with the utterance participants based on the panoramic image and the voiceprint join analysis. In this case, the word correspondence processing unit 16 further includes a voice recognition unit, a voiceprint storage unit, and an utterance time division unit, and the word correspondence processing unit 16 realizes a combination of the two analysis methods by the determination control unit, and the procedure is roughly as follows.
The in-image-section face recognition unit 162 intercepts the face images in the utterance image section divided by the utterance image section dividing unit 161 and recognizes each face as a face in the image section, and determines whether the face images are partially occluded (that is, the corresponding mouth feature points are not acquired).
When the face recognition unit 162 in the image portion judges that the face image is blocked, the judgment control unit controls the recognition corresponding unit 165 to sequentially recognize the voiceprints in the speech sound portions divided by the speech time portion dividing unit according to the voiceprint storage unit, and judges the corresponding participant from the voiceprint storage unit according to the speech voiceprint.
In the second modification, the utterance image division unit 161 and the utterance time division unit each divide the utterance image portion or the utterance sound portion according to an utterance pause in the utterance text, so that the times of the utterance image portion and the utterance sound portion correspond to each other, and when the image-portion-inside-face recognition unit 162 determines that the face of the person in the utterance image portion is blocked, the determination control unit controls the relevant component to analyze the utterance sound portion corresponding to the utterance image portion, thereby achieving the correspondence between the utterance text and the utterance participant.
In the second modification, the character correspondence processing unit can determine the voiceprint features of the participants to correspond the speech characters to the speakers, the voice recognition unit recognizes the voiceprint features of the participants, the recognition correspondence unit can correspond the participants to the speech characters and the speech time according to the voiceprint features, when the face of a person in the panoramic image is blocked and cannot be judged, the voiceprint feature recognition can realize the judgment of the speakers under the condition, and meanwhile, the voiceprint features collected by the character correspondence processing unit and the voiceprint features used for recognition are collected in the same conference site, so that the influence of background noise during voiceprint recognition can be greatly reduced, and high-accuracy voiceprint recognition is realized.
< modification III >
In the second modification, the character association processing unit 16 is a processing unit that associates the utterance characters with the utterance participants based on the panoramic image and the voiceprint join analysis. However, in the present invention, the character association processing unit 16 based only on voiceprint recognition may be used. Specifically, the word correspondence processing unit 16 includes a voice recognition unit, a voiceprint storage unit, an utterance time division unit, and a recognition correspondence unit 165.
The voice recognition unit analyzes the utterance voice stored in the conference record storage unit 11 to acquire the voiceprint characteristics of each participant, and recognizes the voiceprints of different participants.
And the voiceprint storage unit correspondingly stores the voiceprint identified by the voice identification unit and the corresponding participant.
The speech time division unit divides each speech in the speech sound into different speech sound parts according to the speech pause in the speech character. The speech sound part has time information and sound information of the participant, and the identification correspondence unit 165 can realize correspondence between the speech time and the participant by correspondence between the time information and the speech time and a comparison result between the sound information and a voiceprint of the participant.
The recognition correspondence unit 165 sequentially recognizes the voiceprints in the speech sound portions divided by the speech time portion dividing unit from the voiceprint storage unit, and determines the corresponding participant from the voiceprint storage unit according to the speech voiceprint.
The above-described embodiments and modifications are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.

Claims (7)

1. A conference record analysis apparatus for analyzing a conference record that is recorded by panoramic imaging and that contains a panoramic image and a speech sound to acquire a speech record, comprising:
a conference record storage part, a face recognition part, a voice conversion part, a word correspondence processing part and a speech record storage part,
the conference record storage section stores the conference record,
the face recognition part analyzes the panoramic image in the conference record to obtain different face characteristics of each participant and recognizes the face of each participant,
the voice converting part converts the speaking voice into corresponding speaking characters according to time,
the word correspondence processing section corresponds the utterance word to the corresponding participant according to the utterance time of the participant,
the speaking record storage part correspondingly stores the speaking words, the speaking time and the corresponding participant,
the character correspondence processing unit includes:
a speech image part dividing unit which divides each speech in the panoramic image into different speech image parts according to speech pauses in the speech texts;
the in-image-part face recognition unit intercepts the face image in the speech image part and recognizes each face as the face in the image part;
a mouth shape conversion judging unit which respectively judges mouth shape conversion of the human face in each image part in the speaking image part so as to obtain the mouth shape conversion times of the participant corresponding to the human face in each image part in the speaking time;
a speaker determination unit that counts the number of times of mouth shape conversion of a face in each of the image portions in the speech image portion and determines the face in the image portion having the largest number of times of mouth shape conversion as a speech face in the speech image portion;
and the identification corresponding unit is used for sequentially judging the participants corresponding to the speaking time according to the speaking face in each speaking image part.
2. The apparatus for analyzing a conference record according to claim 1, further comprising:
an identification information adding unit and a face storage unit,
wherein the identification information giving part sets identification information of a corresponding number according to the number of the attendees recognized by the face recognition part and gives the identification information to each attendee,
the face storage part correspondingly stores the identification information and the faces of the participants,
the utterance recording storage unit stores the utterance text, the utterance time, and the identification information of the corresponding participant in association with each other.
3. The apparatus for analyzing a conference record according to claim 1, further comprising:
an organization personnel management database and a human face retrieval part,
wherein the organization personnel management database at least stores the identification information of each organization personnel and the corresponding face image,
the face retrieval part retrieves the organization personnel management database according to the faces of the participants identified by the face identification part to obtain the identification information of each participant,
the utterance recording storage unit stores the utterance text, the utterance time, and the identification information of the corresponding participant in association with each other.
4. The conference record parsing apparatus according to any one of claims 1 to 3, wherein:
wherein, the word correspondence processing part further comprises:
the voice recognition unit analyzes the speaking voice in the conference record to obtain the voiceprint characteristics of each participant, so that different voiceprints of the participants are recognized;
the voiceprint storage unit is used for correspondingly storing the voiceprint and the corresponding participant;
a speech time division unit that divides each speech in the speech sound into different speech sound parts according to the speech pause;
the face recognition unit in the image part also judges whether corresponding mouth feature points cannot be acquired so as to judge whether the face image is partially shielded,
and the identification corresponding unit is used for sequentially identifying the voiceprints in the speech sound parts according to the voiceprint storage unit when the face identification unit in the image part judges that the face image is blocked, and judging the corresponding participant from the voiceprint storage unit according to the speech voiceprint.
5. A conference record playback system for processing a conference record that is recorded by panoramic imaging and that contains a panoramic image and a speech sound, and playing back the processed conference record, comprising:
the conference record analysis device is used for processing the conference record; and
a conference record playing device for playing the processed conference record to be watched by the watching users,
wherein the conference record analysis device comprises a conference record storage part, a face recognition part, a voice conversion part, a word correspondence processing part and a speech record storage part,
the conference record storage section stores the conference record,
the face recognition part analyzes the panoramic image in the conference record to obtain different face characteristics of each participant and recognizes the face of each participant,
the voice converting part converts the speaking voice into corresponding speaking characters according to time,
the word correspondence processing section corresponds the utterance word to the corresponding participant according to the utterance time of the participant,
the speaking record storage part correspondingly stores the speaking characters, the speaking time and the corresponding participants;
the conference recording and playing device comprises a picture storage part, a playing control part, an input display part, an image playing part and a sound playing part,
the picture storage part stores a record playing picture which is provided with an image display part and a character display part,
the playback control unit controls the input display unit to display the recorded playback screen, controls the image playback unit to play the panoramic image in the image display unit, controls the sound playback unit to play the speech sound, and further controls the text presentation unit to display the speech text and the speech time in the text display unit according to the speech time so that the viewing user clicks the speech text,
when the viewing user clicks the utterance text, the playback control unit controls the panoramic image and the utterance sound played by the image playback unit and the sound playback unit according to the utterance time corresponding to the clicked utterance text, so that the current playback time and the utterance time of the panoramic image and the utterance sound correspond to each other,
the conference record analysis apparatus is the conference record analysis apparatus according to any one of claims 1 to 4.
6. The system for playing back a conference recording according to claim 5, wherein:
wherein the conference record playing device is also provided with a time progress retrieval part and a face coordinate acquisition part,
the image playing part is a panoramic image playing part which plays the panoramic image according to different visual angles,
the playing control part controls the time progress retrieval part to retrieve the speaking time and the corresponding attendees in the speaking record storage part according to the current playing time according to the preset refreshing time, controls the face coordinate acquisition part to analyze the panoramic image in the conference record according to the attendees to acquire coordinate information corresponding to the faces of the attendees, and further controls the panoramic image playing part to adjust the playing view angle center in the panoramic image to a view angle taking the coordinate information as the center according to the coordinate information corresponding to the speaking face.
7. A conference record analyzing method for analyzing a conference record which is recorded by panoramic imaging and contains a panoramic image and a speech sound to acquire a speech record, comprising the steps of:
a conference record storage step of storing the conference record;
a face recognition step, in which the panoramic image in the conference record is analyzed to obtain different face characteristics of each participant and the face of each participant is recognized;
a voice conversion step of converting the speech sound into a corresponding speech character according to time;
a word correspondence processing step of corresponding the speaking words to the corresponding participants according to the speaking time of the participants;
a speaking record storage step, in which the speaking words, the speaking time and the corresponding participants are correspondingly stored,
the word correspondence processing step includes:
a speaking image part dividing step, namely dividing each section of speaking in the panoramic image into different speaking image parts according to speaking pauses in the speaking characters;
a face identification step in the image part, namely intercepting the face image in the speech image part and identifying each face as the face in the image part;
a mouth shape conversion judging step of judging mouth shape conversion of the face in each image part in the speaking image part respectively so as to obtain the mouth shape conversion times of the participant corresponding to the face in each image part in the speaking time;
a speaker determination step of counting the number of times of mouth shape conversion of the face in each of the image portions in the speech image portion and determining the face in the image portion having the largest number of times of mouth shape conversion as the speech face in the speech image portion;
and a recognition corresponding step, namely sequentially judging the participants corresponding to the speaking time according to the speaking face in each speaking image part.
CN201811353598.7A 2018-11-14 2018-11-14 Conference record analyzing device and method and conference record playing system Active CN111193890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811353598.7A CN111193890B (en) 2018-11-14 2018-11-14 Conference record analyzing device and method and conference record playing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811353598.7A CN111193890B (en) 2018-11-14 2018-11-14 Conference record analyzing device and method and conference record playing system

Publications (2)

Publication Number Publication Date
CN111193890A CN111193890A (en) 2020-05-22
CN111193890B true CN111193890B (en) 2022-06-17

Family

ID=70708959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811353598.7A Active CN111193890B (en) 2018-11-14 2018-11-14 Conference record analyzing device and method and conference record playing system

Country Status (1)

Country Link
CN (1) CN111193890B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037791B (en) * 2020-08-12 2023-01-13 广东电力信息科技有限公司 Conference summary transcription method, apparatus and storage medium
CN111967372B (en) * 2020-08-14 2024-03-05 国网四川省电力公司信息通信公司 Image recognition method for conference system
CN112532912A (en) * 2020-11-20 2021-03-19 北京搜狗科技发展有限公司 Video processing method and device and electronic equipment
CN112672095B (en) * 2020-12-25 2022-10-25 联通在线信息科技有限公司 Teleconferencing system
CN112839195B (en) * 2020-12-30 2023-10-10 深圳市皓丽智能科技有限公司 Conference record consulting method and device, computer equipment and storage medium
CN112887659B (en) * 2021-01-29 2023-06-23 深圳前海微众银行股份有限公司 Conference recording method, device, equipment and storage medium
CN113014732B (en) * 2021-02-04 2022-11-11 腾讯科技(深圳)有限公司 Conference record processing method and device, computer equipment and storage medium
CN113822205A (en) * 2021-09-26 2021-12-21 北京市商汤科技开发有限公司 Conference record generation method and device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002099530A (en) * 2000-09-22 2002-04-05 Sharp Corp Minutes production device, method and storage medium using it
US7598975B2 (en) * 2002-06-21 2009-10-06 Microsoft Corporation Automatic face extraction for use in recorded meetings timelines
WO2016126769A1 (en) * 2015-02-03 2016-08-11 Dolby Laboratories Licensing Corporation Conference searching and playback of search results
CN105915798A (en) * 2016-06-02 2016-08-31 北京小米移动软件有限公司 Camera control method in video conference and control device thereof
CN106657865B (en) * 2016-12-16 2020-08-25 联想(北京)有限公司 Conference summary generation method and device and video conference system

Also Published As

Publication number Publication date
CN111193890A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111193890B (en) Conference record analyzing device and method and conference record playing system
US10733574B2 (en) Systems and methods for logging and reviewing a meeting
CN108305632B (en) Method and system for forming voice abstract of conference
Wellner et al. Browsing recorded meetings with Ferret
WO2018107605A1 (en) System and method for converting audio/video data into written records
CN112672095B (en) Teleconferencing system
JP2006085440A (en) Information processing system, information processing method and computer program
JP2000125274A (en) Method and system to index contents of conference
US20100085363A1 (en) Photo Realistic Talking Head Creation, Content Creation, and Distribution System and Method
JP2007006473A (en) System and method for interpreting digital information, and storage medium to store command for executing the method
JP6304941B2 (en) CONFERENCE INFORMATION RECORDING SYSTEM, INFORMATION PROCESSING DEVICE, CONTROL METHOD, AND COMPUTER PROGRAM
JP2001256335A (en) Conference recording system
JPWO2005027092A1 (en) Document creation and browsing method, document creation and browsing device, document creation and browsing robot, and document creation and browsing program
JP2005267279A (en) Information processing system and information processing method, and computer program
US20150287434A1 (en) Method of capturing and structuring information from a meeting
Caridakis et al. A multimodal corpus for gesture expressivity analysis
JP4572545B2 (en) Information processing system, information processing method, and computer program
WO2023160288A1 (en) Conference summary generation method and apparatus, electronic device, and readable storage medium
US20050131697A1 (en) Speech improving apparatus, system and method
JPH11259501A (en) Speech structure detector/display
Wellner et al. Browsing recordings of multi-party interactions in ambient intelligence environments
KR102291113B1 (en) Apparatus and method for producing conference record
Ronzhin et al. A software system for the audiovisual monitoring of an intelligent meeting room in support of scientific and education activities
US11099811B2 (en) Systems and methods for displaying subjects of an audio portion of content and displaying autocomplete suggestions for a search related to a subject of the audio portion
US20200075025A1 (en) Information processing apparatus and facilitation support method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant