CN110211590A

CN110211590A - A kind of processing method, device, terminal device and the storage medium of meeting hot spot

Info

Publication number: CN110211590A
Application number: CN201910549987.5A
Authority: CN
Inventors: 蒋宇东
Original assignee: Xinhua Wisdom Cloud Technology Co Ltd
Current assignee: Xinhua Wisdom Cloud Technology Co Ltd
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2019-09-06
Anticipated expiration: 2039-06-24
Also published as: CN110211590B

Abstract

The embodiment of the invention discloses processing method, device, terminal device and the storage mediums of a kind of meeting hot spot；The described method includes: obtaining the audio data and/or video data of meeting；Based on the audio data and/or video data, the key scenes of the meeting are identified；The first audio data where obtaining the key scenes in first time period；By identifying first audio data, the first text information is obtained；Based on first text information, the hot information of the meeting is generated.The method can extract the key scenes of the meeting, hot information is obtained based on the key scenes, artificial extraction hot information is needed not rely on, can be improved the efficiency for obtaining hot information, and hot information can be obtained in real time, to improve the intelligence of system；And the method can also improve the accuracy and validity for obtaining hot information.

Description

A kind of processing method, device, terminal device and the storage medium of meeting hot spot

Technical field

The present invention relates to field of communication technology more particularly to a kind of processing method of meeting hot spot, device, terminal and deposit Storage media.

Background technique

Currently, all kinds of meetings held daily are innumerable, in order to record the news point information of meeting, usually by editing Personnel pass through the news point that the text of a statement or speech and video of meeting etc. find the meeting.However, this kind of mode to the dependence of editor very Greatly, editorial staff is needed directly to participate in entire conference process, human cost is higher；And if meeting is the meeting of some professional domain View, it is also necessary to which editorial staff has stronger professional knowledge；And if the duration of meeting is longer, it is also easy to miss Important information.

Summary of the invention

In view of this, the present invention provides a kind of information processing method, device, terminal device and storage medium, at least portion Divide and solves the above problems.

The technical scheme of the present invention is realized as follows:

A kind of processing method of meeting hot spot, which comprises

Obtain the audio data and/or video data of meeting；

Based on the audio data and/or video data, the key scenes of the meeting are identified；

The first audio data where obtaining the key scenes in first time period；

By identifying first audio data, the first text information is obtained；

Based on first text information, the hot information of the meeting is generated.

In above scheme, the method also includes:

The first video data where obtaining the key scenes in first time period；

Based on first text information and first video data, the hot video of the meeting is generated.

It is described to be based on the audio data and/or video data in above scheme, identify the crucial field of the meeting Scape, comprising:

Audio identification is carried out to the audio data, to determine the first key scenes of the meeting；Wherein, described First key scenes include at least one of: voice, applause, laugh, song；And/or

Key frame is extracted from the video data；The second crucial field of the meeting is identified based on the key frame Scape；Wherein, second key scenes include at least one of: interview, spectators are applauded, key person makes a speech.

It is described by identifying first audio data in above scheme, obtain the first text information, comprising:

By identifying first audio data, the voiceprint of at least one key person is obtained；Based on the key The voiceprint of personage obtains the first sub- text information of the key person；

It is described to be based on first text information, generate the hot information of meeting, comprising:

It is inserted into the identification information of the key person in the described first sub- text information, generates and is based on the key person The hot information of object.

In above scheme, the method also includes:

Based on the voiceprint of a key person at least one key person, extracted from the first video data The first sub-video data of the key person；

Based on first sub-video data, the hot video of the key person is generated.

In above scheme, the method also includes:

Based on the voiceprint of multiple key persons, extracted from the first video data and the multiple key person Corresponding second sub-video data；

Based on the respective weight coefficient of the multiple key person, extracted from corresponding second sub-video data The video clip of corresponding period out；

Gather the corresponding video clip of the multiple key person, generates the hot spot view of the multiple key person Frequently.

It is described to be based on first text information in above scheme, generate the hot information of the meeting, comprising:

Extract the key message in first text information；Wherein, the key message include it is following at least it One: indicating the keyword of key person, the keyword of instruction movement, the relevant keyword of finance and economics and/or key sentence, technology Relevant keyword and/or the relevant keyword of key sentence, topical news and/or key sentence；

Based on the key message, the hot information of the meeting is generated.

The embodiment of the invention also provides a kind of processing unit of meeting hot spot, described device includes:

First obtains module, for obtaining the audio data and/or video data of meeting；

First identification module identifies the key of the meeting for being based on the audio data and/or video data Scene；

Second obtains module, for the first audio data in first time period where obtaining the key scenes；

Second identification module, for obtaining the first text information by identifying first audio data；

Generation module generates the hot information of the meeting for being based on first text information.

The embodiment of the invention also provides a kind of terminal device, the terminal device includes: processor and for storing energy Enough memories for running Computer Service on a processor, wherein when the processor is used to run the Computer Service, it is real The processing method of meeting hot spot described in existing any embodiment of the present invention.

The embodiment of the invention also provides a kind of storage medium, there are computer executable instructions in the storage medium, It is characterized in that, the computer executable instructions, which are executed by processor, realizes meeting hot spot described in any embodiment of the present invention Processing method.

The embodiment of the invention discloses a kind of processing method of meeting hot spot, pass through the audio data for obtaining meeting and/or Video data；Based on the audio data and/or video data, the key scenes of the meeting are identified；It can be from video counts According to and/or audio data in find key scenes in meeting automatically；Pass through first time where obtaining the key scenes again The first audio data in section；By identifying first audio data, the first text information is obtained, so that based on described the One text information generates the hot information of the meeting, can by key scenes the sound within the scope of certain time in a meeting Frequency obtains potential hot information in meeting according to being converted；So, it is possible the identification based on key scenes in meeting and More comprehensive hot information is obtained, can be improved the accuracy and validity of the hot information of acquisition.And the embodiment of the present invention It needs not rely on editorial staff directly to participate in entire conference process and take passages important hot information, human cost can be saved, The intelligence of lifting system.

Detailed description of the invention

Fig. 1 is a kind of flow diagram of the processing method of meeting hot spot provided in an embodiment of the present invention；

Fig. 2 is the flow diagram of the processing method of another meeting hot spot provided in an embodiment of the present invention；

Fig. 3 is the flow diagram of the processing method of another meeting hot spot provided in an embodiment of the present invention；

Fig. 4 is a kind of structural schematic diagram of the processing unit of meeting hot spot provided in an embodiment of the present invention；

Fig. 5 is a kind of hardware structural diagram of terminal device provided in an embodiment of the present invention.

Specific embodiment

Lower combination accompanying drawings and embodiments, the present invention will be described in further detail.It should be appreciated that described herein Specific embodiment is only used to explain the present invention, is not intended to limit the present invention.

Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff it is identical.Term as used herein in the specification of the present invention is intended merely to retouch State the purpose of specific embodiment, it is not intended that in the limitation present invention.Term as used herein "and/or" include one or Any and all combinations of multiple relevant listed items.

As shown in Figure 1, the embodiment of the invention provides a kind of processing methods of meeting hot spot, comprising:

Step 101, the audio data and/or video data of meeting are obtained；

Step 102, it is based on the audio data and/or video data, identifies the key scenes of the meeting；

Step 103, the first audio data where obtaining the key scenes in first time period；

Step 104, by identifying first audio data, the first text information is obtained；

Step 105, it is based on first text information, generates the hot information of the meeting.

Method described in the embodiment of the present invention be applied to terminal device, the terminal device include but is not limited to down toward It is one of few: computer, server, mobile phone.

In some embodiments, the terminal device can be setting audio collecting device and/or the dress of video acquisition It sets.In this way, the audio data for obtaining meeting can directly obtain the audio of the meeting using the audio collecting device Data；The video data for obtaining meeting directly obtains the video data of the meeting using the video acquisition device.

In further embodiments, the terminal device obtains the audio number for the meeting that other electronic equipments are sent According to and/or video data.

Wherein, the meeting can be workability meeting, for example, mobilization meeting, experience exchangement meeting, work arrange meeting or Summing-up meeting, etc.；The meeting can be professional meeting, for example, seminar, forum, workshop, evaluation meeting, etc.；Institute Stating meeting can be commercial affairs meeting, such as business negotiations meeting, theatre party, advertisement introduction meeting, etc.；The meeting can be Informedness meeting, for example, news briefing, press conference, public lecture, consultation meeting, etc.；The meeting can be decision The meeting of property, for example, Standing Committee, Party Committees meeting, council, etc..

In practical applications, if only getting the audio data of the meeting, meeting can be obtained based on the audio data The key scenes of view；If getting the audio data and video data of the meeting, the audio data and view can be based on Frequency evidence obtains the key scenes of the meeting.

Wherein, the key scenes include but is not limited at least one of:

Voice, applause, laugh, song, spectators applaud, key person makes a speech, interview.

Here, the key scenes may include one or more key scenes.

Here, the first time period include: several seconds, Shi Jimiao, tens seconds or a few minutes, etc..When described first Between section can for the key scenes occur time point or period, alternatively, the first time period can be to include described The a period of time at time point or period that key scenes occur.

For example, the key scenes of " applause ", the first time period occur can be at the 10th to 12 second of meeting Refer to the 8th second to 12 seconds this times of meeting；The first audio data where obtaining the key scenes in first time period is then For the audio data of the 8th second to the 12nd second this time of acquisition meeting.

For another example, at the 20th to 25 minute of meeting, there is the key scenes of " interview ", the first time period It can refer to the 20th to 25 minute this time of meeting；The first sound where obtaining the key scenes in first time period Frequency obtains the audio data of the 20th to 25 minute this time of meeting according to then.

Wherein, a kind of implementation of the step 102 are as follows: audio feature vector is extracted from the audio data, with And video feature vector is extracted from video data；The audio feature vector is inputted into first nerves network model, with identification First key scenes of the meeting out；The video feature vector is inputted into nervus opticus network model, it is described to identify Second key scenes of meeting.

Here, the key scenes include first key scenes and the second key scenes；First key scenes It is different from second key scenes, alternatively, the part sub-scene of first key scenes and second key scenes Part sub-scene is identical.

Wherein, a kind of implementation of the step 104 are as follows: speech analysis means are provided in the terminal device, Speech recognition is carried out to first audio data based on the speech analysis means, is obtained and first audio data pair The first text information answered.

Wherein, a kind of implementation of the step 105 are as follows: based in first text information keyword and/or Key sentence generates the hot information of the meeting.

In practical applications, voice conversion can also be carried out to the audio data of meeting, obtains the alternative text of entire meeting This information extracts alternative key message from the alternative text information；Based on the alternative key message and the first text The key message obtained in information generates the hot information of the meeting.

In embodiments of the present invention, by being based on institute's audio data and/or video data, the meeting can be identified Key scenes, based on the first audio data in first time period where the key scenes, available first audio number According to corresponding first text information, it can the audio data of important scene is obtained, thus from the important scene Audio data in extract potential hot information, can be improved obtain hot information accuracy and validity；And it is not required to It relies on editorial staff directly to participate in entire conference process and take passages important hot information, the processing of hot information can be simplified Process can be improved the efficiency for obtaining hot information, can save human cost, and the intelligence of energy lifting system.

And the embodiment of the present invention can also carry out the meeting recorded the acquisition of key scenes, can be based on institute It states key scenes and obtains hot news, obtain the real-time of hot news so as to more timely obtain hot news, improve Property.

In some embodiments, the step 102, including

Here, first key scenes include one or more, when each first key scenes correspond to different Between section；Second key scenes include one or more, and each second key scenes correspond to the different periods.

First key scenes are different with second key scenes, alternatively, the part in first key scenes Scene is identical as the part scene in second key scenes.

In embodiments of the present invention, it can be based on audio data and/or video data, determine different key scenes. Since certain special scenes are easier to be based on audio data acquisition, then it can use audio data and obtain laugh, song etc. The first key scenes；Since certain special scenes are easier to be based on video data acquisition, then it can use video data acquisition Second key scenes of interview, key person's speech etc.；It so, it is possible more comprehensively and accurately to obtain in meeting Key point and/or point-of-interest etc. information, to extract more accurate and effective hot information.

In further embodiments, the step 102, comprising:

Audio identification is carried out to the audio data, determines alternative first key scenes of the meeting；

If it is determined that when the duration of the first alternative key scenes is greater than first threshold, described alternative the is determined One key scenes are first key scenes；And/or

Key frame is extracted from the video data；Identify the meeting based on the key frame alternative second is closed Key scene；If it is determined that determining that described alternative second closes when the duration of alternative second key scenes is greater than first threshold Key scene is second key scenes.

In inventive embodiments, the duration of determining alternative key scenes can be judged, if alternative crucial When the duration of scene is greater than certain threshold value, just determine that the alternative scene is key scenes；It so, it is possible further really The information of key point in meeting and/or point-of-interest etc. is made, to improve the accuracy for obtaining hot information.

In some embodiments, described that key frame is extracted from the video data, including but not limited to it is following at least it One:

Key frame is extracted with prefixed time interval；

The key frame of the first quantity is extracted within the unit time；

If it is determined that significant change occurs for the image of display, key frame when changing in predetermined amount of time is extracted.

Wherein, the prefixed time interval can be determined according to the first lasting duration of meeting.For example, if the meeting Duration be 1 hour, can determine the prefixed time interval be 1 minute or half a minute.

Wherein, the unit time can be 1 second, 10 seconds, 1 minute, a few minutes, etc..

Wherein, it includes but is not limited at least one of that significant change, which occurs, for described image: the key person in image increases Add deduct the key person less, in image position change, the behavior act of key person in image changes, Place in image changes.

In this way, can be analyzed, not needed to the meeting just for partial video frame in the embodiment of the present invention Included video frame is analyzed in all videos；Be conducive to be based only upon the key frame in video frame and obtain and identify pass Key scene and/or face so as to reduce the treating capacity of data, and improve and generate picture and text news release and news-video original text Efficiency.

In some embodiments, the method also includes:

The first video data where obtaining the key scenes in first time period；

For example, determining that key scenes are key person A speech, laugh, reporter B interview；Wherein, the key person A Speech occurs at the 15th to 20 minute of the meeting；The 25th to 26 minute in the meeting occurs for the laugh；The note Person B interview is the 30th to 33 minute；The 15th to the 20 minute video data, the 24th to 26 minute view can then be extracted Frequency is accordingly and the 30th to 33 minute video data.Here, the video data for extracting the laugh scene can extract packet Include the video data before the laugh scene certain time occurs.By the 15th to the 20 minute video data, the 24th to 26 minutes video datas and the 30th to 33 minute video data are spliced, and the hot video of the meeting is generated；And First text information is corresponded into first video data and carries out Subtitle Demonstration.

In further embodiments, described to be based on first text information and first video data, described in generation The hot video of meeting, comprising:

Obtain key message in first text information；

Video data corresponding with the key message is extracted, from first video data to generate the meeting Hot video.

For example, determine key scenes be key scenes be key person A speech；Wherein, the key person A speech The 15th to 20 minute in the meeting occurs；Get the first text information of the key person A speech；Based on described First text information obtains at least one key message, wherein the key message corresponds to the 16th to 18 of the meeting Minute；The 16-18 minutes corresponding video datas then need to be also extracted from the 15th to the 20 minute video data； The hot video of meeting is obtained based on described 16-18 minutes corresponding video datas.In this way, can be further improved acquisition To the accuracy of hot video, the extraction of some unnecessary video datas is greatly reduced.

In some embodiments, the step 104, comprising:

The step 105 includes:

Here, the key person can be the forward personage that appears on the scene in meeting；The key person may be speech Time is more than the personage of first time threshold；The key person may be the personage of important information speech；The key person Object may be the personage individually occurred in a certain number of video frames；Etc..

In embodiments of the present invention, the vocal print of key person can be obtained based on the Application on Voiceprint Recognition to the first audio data Information, to obtain the corresponding first sub- text information of the key person；To establish the hot spot letter of the key person Breath.And the identification information of the key person can be inserted into the place of the corresponding speech of the key person, it is convenient to not Speech with key person arranges, and user is facilitated to read.

In some embodiments, before the step 105, further includes:

Based on the voiceprint of the key person, the identification information of the key person is obtained.

In practical applications, the terminal device can preset storage relation table；The relation table has recorded the sound of personage Line information and identification information corresponding relationship；In this way, when getting the voiceprint of key person, it can be from the relation table In, it has searched whether and the matched identification information of key person's voiceprint；If so, the key can be directly acquired The identification information of personage.

In some embodiments, as shown in Fig. 2, the method also includes:

Step 106a, based on the voiceprint of a key person at least one key person, from the first video data In extract the first sub-video data of the key person；Based on first sub-video data, the key person is generated Hot video.

In embodiments of the present invention, the first sub-video data that can extract the key person, establishes key person's Hot video.

In further embodiments, as shown in Fig. 2, the method also includes:

Step 106b, based on the voiceprint of multiple key persons, extracted from the first video data with it is the multiple Corresponding second sub-video data of key person；Based on the respective weight coefficient of the multiple key person, from corresponding The video clip of corresponding period is extracted in second sub-video data；Gather the corresponding institute of the multiple key person Video clip is stated, the hot video of the multiple key person is generated.

Here, the corresponding video clip of the multiple key person of the set can be with are as follows: by the multiple key The video clip of personage is stitched together, to form the video of an entirety.

In some embodiments, the respective weight coefficient of determining key person, comprising: determined according to the meeting Time for competiton of the key person, the key person seat, go out by web search the identity of the key person At least one of information and/or personal information determine the respective weight coefficient of key person.

Such as, however, it is determined that key person's time for competiton is more forward or rearward, then the weight coefficient of the key person What is be arranged is bigger；If it is determined that the time for competiton of the key person more leans on centre, then the weight coefficient setting of the key person It is smaller.

For another example, if the identity information for finding the key person by network is more important, the weight of the key person Coefficient is arranged bigger；If more inessential by the identity information that network finds the key person, the key person's Weight coefficient is arranged smaller.

It is understood that the weight coefficient is bigger, then it is corresponding to extract the video clip for getting over long period；The power Weight coefficient is smaller, then corresponding to extract the video clip for getting over short time period.

In embodiments of the present invention, the second sub-video data of multiple key persons can be extracted, establishing includes multiple passes The hot video of the set of key personage speech.Further, it can be established based on the significance level of the key person The different hot video of corresponding time limit of speech；The time that more cheese can be made to be made a speech is longer；More unessential people The time that object is made a speech is about short.

In some embodiments, the step 105, comprising:

Based on the key message, the hot information of the meeting is generated.

Here, the key message can also include instruction word relevant to time, place, etc..

Here, the keyword may include that word is introduced in key person's name, time, and/or place etc.；The pass Keyword can also include that conclusion, action, order, requirement and/or execution etc. act word；The keyword can also include special The technical term in industry field and/or the hot spot word of topical news；For example, the technical term of the professional domain can for 5G, Artificial intelligence, neural network, etc.；The hot spot word of the topical news can be Huawei etc..

In some scenes, the embodiment of the present invention can also be the audio data based on the meeting, obtain the audio The corresponding alternative text information of data；Key message is obtained based on the alternative text information, is based on the key message and institute The video data of key scenes, the audio data of the key scenes are stated, the hot information of the meeting is obtained.

In embodiments of the present invention, the key scenes in meeting can be identified, and based on crucial in audio data Word, key sentence etc., extract the news point of the meeting, to obtain the hot information of the meeting；In this way, can be into one Step improves the accuracy and validity for obtaining hot information.

As shown in figure 3, the method includes following the embodiment of the invention discloses a kind of processing method of meeting hot spot Step:

Step S301: the content of meeting is acquired；

Optionally, the content of terminal device acquisition meeting；The content of the meeting includes audio data and video data.

Step S302: the audio data of the meeting is separated；

Optionally, the terminal device isolates the audio data from the content of the meeting.

Step S303: the key frame of video data is extracted；

Optionally, the terminal device extracts key frame from the video data with prefixed time interval.

Step S304a: speech recognition；

Optionally, the terminal device carries out speech recognition to the audio data, and it is corresponding to obtain the audio data Alternative text information.

Step S304b: audio classification；

Optionally, the terminal device carries out audio classification to the audio data, and the audio data is divided into and is spoken The scene of sound, applause, laugh, song；And it is extracted from the audio data and the voice, applause, laugh, song Corresponding first audio data of scene.

Step S304c: Application on Voiceprint Recognition；

Optionally, the terminal device carries out Application on Voiceprint Recognition to the audio data, obtains out at least one key person Vocal print feature information；And the corresponding second audio number of at least one described key person is extracted from the audio data According to.

Step S305a: scene Recognition；

Optionally, the terminal device obtains the second image of the key frame；Based on the first image number, identification Interview, the scene that spectators applaud, key person makes a speech out；And from the video data obtain with the interview, Corresponding first video data of scene that spectators applaud, key person makes a speech.

Step S305b: recognition of face；

Optionally, the terminal device obtains the second image of the key frame；Face is carried out based on second image Identification, to obtain the second video data of key person.

Wherein, recognition of face can be carried out using neural network model.Specifically, the terminal device, which obtains, includes There is a training set of images of the training image of face, the training image includes carry key feature points markup information original Image；Initial convolutional neural networks are repeatedly trained based on described image training set, until loss meets the condition of convergence, are obtained Neural network model after to training.The first image is trained using the neural network model after training, is identified The face for including in the first image.

Here, the loss function (loss function) is also cost function (cost function), is nerve net The process of the objective function of network optimization, neural metwork training or optimization is exactly to minimize the process of loss function, loss function It is worth smaller, the result of corresponding prediction and the value of legitimate reading are with regard to closer.

Step S306: the comprehensive analysis of multi-modal data is carried out；

Optionally, by the alternative text information, first audio data, the first video data and second view Frequency is according to being input in multi-modal data model, to carry out the comprehensive analysis of multi-modal data.

Wherein, the multi-modal data model may be neural network model.

Step S307a: the keyword of meeting is extracted；

Optionally, comprehensive analysis of the terminal device based on the multi-modal data obtains the keyword of meeting.

Wherein, the keyword may include that word is introduced in key person's name, time, and/or place etc.；The pass Keyword can also include that conclusion, action, order, requirement and/or execution etc. act word；The keyword can also include special The technical term in industry field, the hot spot word of topical news, and/or the relevant hot spot word of finance and economics.For example, the profession neck The technical term in domain can be 5G, artificial intelligence, neural network, block chain, etc.；The hot spot word of the topical news can Think Huawei, G20 summit, China's Space station, etc.；The relevant hot spot word of the finance and economics can be tax reduction, macroscopical lever Rate, etc..

Step S307b: the key sentence of meeting is extracted；

Optionally, comprehensive analysis of the terminal device based on the Multi-state data obtains the key sentence of meeting.

Wherein, key sentence includes but is not limited at least one of: key sentence, the professional domain of instruction movement Technology sentence, the hot spot sentence of topical news, the relevant key sentence of finance and economics.

Step S307c: the potential news point of meeting is extracted；

Optionally, comprehensive analysis of the terminal device based on the Multi-state data obtains scene of interest in meeting； Potential news point is extracted based on the scene of interest.

Here, the scene of interest can be the scenes such as applause, laugh；The news point is the scene of interest Corresponding text information.

Step S308: meeting outline is generated；

Optionally, the terminal device is based on the keyword, the key sentence and the news point, generates meeting The outline of view.

Here, the meeting outline is the hot information in above-described embodiment.

Step S309: picture and text news release is generated；

Optionally, the terminal device utilizes pre-set format, and the meeting outline is generated picture and text news release.

Here, the pre-set format can be used for providing the organization of unity form of each news point；For example, right It, can be new to provide with the upper limit value of personage, technical essential and number of words of making a speech in the news release of the speech of multiple key persons Hear the organization of unity form of original text.

Step S310: segmentation cutting is carried out to the TV news；

Optionally, the terminal device is based on the second audio data, first video data and second view Frequency is according to segmentation cutting is carried out to TV news, to obtain key person and/or the corresponding video data of key scenes.

Step S311: highlight point video is extracted；

Optionally, the terminal device is based on the key person and/or the corresponding video data of key scenes and institute Meeting outline is stated, highlight point video is extracted.

Here, the highlight point video is the hot video in above-described embodiment.

Step S312: news-video original text is generated.

Optionally, the terminal device utilizes pre-set format, and it is new that the highlight point video is generated video Hear original text.

In an alternative embodiment, each key person is arranged using the pre-set format in the terminal device The correspondence time for the time made a speech.

In the embodiment of the present invention, it can be based on audio classification, determine at least partly key scenes in meeting, such as Song, laugh, applause and voice etc.；Based on the scene Recognition to video data, its in the meeting is further determined that out Its Partial key scene；In this way, can obtain than news point crucial in more comprehensive meeting, it is relatively more accurate so as to obtain Hot information.

And in the embodiment of the present invention, the vocal print that based on the Application on Voiceprint Recognition to audio data, can obtain key person is special Sign, to obtain the audio data of the key person；In this way, the extraction to the hot news of key person may be implemented. And further, the video hotspot information of key person can also be isolated in the embodiment of the present invention.

And in the embodiment of the present invention, picture and text news release and/or news-video can be generated based on pre-set format Original text can make the hot information more specification, clean and tidy, convenient for the reading or viewing of user, promote the experience of user.

It need to be noted that: the description of the processing method of following meeting hot spot, the processing side with above-mentioned meeting hot spot Method item description be it is similar, with method beneficial effect describe, do not repeat them here.For the processing unit of meeting hot spot of the present invention Undisclosed technical detail in embodiment please refers to the description of the processing method embodiment of meeting hot spot of the present invention.

As shown in figure 4, the embodiment of the invention also provides a kind of processing unit of meeting hot spot, described device includes:

First obtains module 41, for obtaining the audio data and/or video data of meeting；

First identification module 42 identifies the pass of the meeting for being based on the audio data and/or video data Key scene；

Second obtains module 43, for the first audio data in first time period where obtaining the key scenes；

Second identification module 44, for obtaining the first text information by identifying first audio data；

Generation module 45 generates the hot information of the meeting for being based on first text information.

In some embodiments, the generation module 45, for based in first text information keyword and/ Or key sentence, generate the hot information of the meeting.

In some embodiments, described second module 43 is obtained, for first time period where obtaining the key scenes The first interior video data；

The generation module 45 generates the meeting for being based on first text information and first video data The hot video of view.

In some embodiments, the generation module 45 is also used to obtain key message in first text information； Video data corresponding with the key message is extracted, from first video data to generate the hot spot view of the meeting Frequently.

In some embodiments, first identification module 42, for carrying out audio classification to the audio data, with Determine the first key scenes of the meeting；Wherein, first key scenes include at least one of: voice, the palm Sound, laugh, song；And/or

For extracting key frame from the video data；Identify the meeting based on the key frame second is closed Key scene；Wherein, second key scenes include at least one of: interview, spectators are applauded, key person makes a speech.

In some implementations, first identification module 42 is also used to if it is determined that the first alternative key scenes When duration is greater than first threshold, determine that alternative first key scenes are first key scenes；And/or

In some embodiments, second identification module 44, for obtaining by identifying first audio data The voiceprint of at least one key person；Based on the voiceprint of the key person, the first of the key person is obtained Sub- text information；

The generation module 45, for being inserted into the identification information of the key person in the described first sub- text information, Generate the hot information based on the key person.

In some embodiments, described second module 43 is obtained, for based on a key at least one key person The voiceprint of personage extracts the first sub-video data of the key person from the first video data；

The generation module 45 generates the hot spot view of the key person for being based on first sub-video data Frequently.

In some embodiments, described second module 43 is obtained, for the voiceprint based on multiple key persons, from The second sub-video data corresponding with the multiple key person is extracted in first video data；

The generation module 45 generates described more for gathering the corresponding video clip of the multiple key person The hot video of a key person.

In some embodiments, the generation module 45, for extracting the crucial letter in first text information Breath；Wherein, the key message includes at least one of: indicate key person keyword, instruction movement keyword, The relevant keyword of finance and economics and/or the relevant keyword of key sentence, technology and/or the relevant pass of key sentence, topical news Keyword and/or key sentence；Based on the key message, the hot information of the meeting is generated.

As shown in figure 5, the terminal device includes: processor 51 the embodiment of the invention also discloses a kind of terminal device With for storing the memory 52 that can run Computer Service on processor 51, wherein the processor 51 is for running When the Computer Service, the processing method for being applied to the meeting hot spot of the terminal device is realized.

In some embodiments, the memory in the embodiment of the present invention can be volatile memory or non-volatile deposit Reservoir, or may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read-Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), erasable programmable are only Read memory (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, ) or flash memory EEPROM.Volatile memory can be random access memory (Random Access Memory, RAM), As External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random Access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic with Machine accesses memory (Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data Rate SDRAM, DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronized links dynamic random access memory (Synchlink DRAM, SLDRAM) and direct memory it is total Line random access memory (Direct Rambus RAM, DRRAM).The memory of system and method described herein is intended to The including but not limited to memory of these and any other suitable type.

And possible kind of the IC chip of processor, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor or software form.Above-mentioned place Reason device can be general processor, digital signal processor (Digital Signal Processor, DSP), dedicated integrated electricity Road (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic device Part, discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention. General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with of the invention real The step of applying method disclosed in example can be embodied directly in hardware decoding processor and execute completion, or use decoding processor In hardware and software module combination execute completion.Software module can be located at random access memory, flash memory, read-only memory, In the storage medium of this fields such as programmable read only memory or electrically erasable programmable memory, register maturation.This is deposited The step of storage media is located at memory, and processor reads the information in memory, completes the above method in conjunction with its hardware.

In some embodiments, embodiments described herein can use hardware, software, firmware, middleware, microcode Or combinations thereof realize.For hardware realization, processing unit be may be implemented in one or more specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing appts (DSP Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field-Programmable Gate Array, FPGA), general processor, controller, microcontroller, microprocessor, for executing the other of herein described function In electronic unit or combinations thereof.

For software implementations, it can be realized herein by executing the module (such as process, function etc.) of function described herein The technology.Software code is storable in memory and is executed by processor.Memory can in the processor or It is realized outside processor.

Further embodiment of this invention provides a kind of computer storage medium, which has Executable program, it can be achieved that being applied in the server or terminal device when the executable code processor executes The step of processing method of meeting hot spot.For example, one or more of method as shown in FIG. 1 to FIG. 3.

In some embodiments, the computer storage medium may include: USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk Etc. the various media that can store program code.

In several embodiments provided herein, it should be understood that disclosed device and method can pass through Other modes are realized.

It, in the absence of conflict, can be with it should be understood that between technical solution documented by the embodiment of the present invention Any combination.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, appoints What those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, answer It is included within the scope of the present invention.Therefore, protection scope of the present invention should be with the scope of protection of the claims It is quasi-.

Claims

1. a kind of processing method of meeting hot spot, which is characterized in that the described method includes:

Obtain the audio data and/or video data of meeting；

The first audio data where obtaining the key scenes in first time period；

By identifying first audio data, the first text information is obtained；

2. the method according to claim 1, wherein the method also includes:

The first video data where obtaining the key scenes in first time period；

3. knowing the method according to claim 1, wherein described be based on the audio data and/or video data Not Chu the meeting key scenes, comprising:

Audio classification is carried out to the audio data, to determine the first key scenes of the meeting；Wherein, it described first closes Key scene includes at least one of: voice, applause, laugh, song；

And/or

Key frame is extracted from the video data；The second key scenes of the meeting are identified based on the key frame；Its In, second key scenes include at least one of: interview, spectators are applauded, key person makes a speech.

4. the method according to claim 1, wherein described by identifying first audio data, the is obtained One text information, comprising:

By identifying first audio data, the voiceprint of at least one key person is obtained；Based on the key person Voiceprint, obtain the first sub- text information of the key person；

It is inserted into the identification information of the key person in the described first sub- text information, generates the heat based on the key person Point information.

5. according to the method described in claim 4, it is characterized in that, the method also includes:

Based on the voiceprint of a key person at least one key person, the pass is extracted from the first video data The first sub-video data of key personage；

Based on first sub-video data, the hot video of the key person is generated.

6. according to the method described in claim 4, it is characterized in that, the method also includes:

Based on the voiceprint of multiple key persons, it is right respectively with the multiple key person to extract from the first video data The second sub-video data answered；

Based on the respective weight coefficient of the multiple key person, extracted from corresponding second sub-video data corresponding The video clip of period；

Gather the corresponding video clip of the multiple key person, generates the hot video of the multiple key person.

7. generating the meeting the method according to claim 1, wherein described be based on first text information The hot information of view, comprising:

Extract the key message in first text information；Wherein, the key message includes at least one of: instruction Keyword, the relevant keyword of finance and economics and/or the relevant pass of key sentence, technology that the keyword of key person, instruction act Keyword and/or the relevant keyword of key words, topical news and/or key sentence；

Based on the key message, the hot information of the meeting is generated.

8. a kind of processing unit of meeting hot spot, which is characterized in that described device includes:

First identification module identifies the key scenes of the meeting for being based on the audio data and/or video data；

9. a kind of terminal device, which is characterized in that the terminal device includes: processor and can be on a processor for storing The memory of Computer Service is run, wherein the processor is for realizing claim 1-7 when running the Computer Service The processing method of described in any item meeting hot spots.

10. a kind of storage medium, there are computer executable instructions in the storage medium, which is characterized in that the computer can Execute instruction the processing method for being executed by processor and realizing the described in any item meeting hot spots of claim 1-7.