CN110347866A

CN110347866A - Information processing method, device, storage medium and electronic equipment

Info

Publication number: CN110347866A
Application number: CN201910606313.4A
Authority: CN
Inventors: 庄凯; 肖剑锋; 崔恒利
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2019-10-18
Anticipated expiration: 2039-07-05
Also published as: CN110347866B

Abstract

This application discloses a kind of information processing method, device, storage medium and electronic equipments, wherein this method comprises: obtaining target data, wherein the target data includes image data and audio data；Keyword search is carried out to the audio data；At least one keyword in the audio data is obtained according to preset condition, corresponding first label is formed based on the keyword；By first label addition in the image of the target data, the first operation can be carried out to the target data by first label.The information processing method of the embodiment of the present invention, mode based on speech recognition, keyword search is carried out to the speech content of personage in target data, obtain the keyword that can characterize target data content, and the first label is added based on the keyword, it can be realized and add tagged purpose automatically, be beneficial to improve the addition efficiency of label, to save cost of labor.

Description

Information processing method, device, storage medium and electronic equipment

Technical field

This application involves technical field of electronic equipment, in particular to a kind of information processing method, information processing unit, storage Medium and electronic equipment.

Background technique

With the development of multimedia technology, pass through videograph conference process, teaching process, enforcing law and monitoring process Equal event procedure are more and more common, and consequent is that the data volume of video is increasing, and user by searching in video manually Particular video frequency content efficiency it is lower.In the prior art, by way of adding video tab at specific video content, with Facilitate user when playing video, the mesh quickly jumped at particular video frequency content can be realized by clicking video tab 's.But photographic device can not directly add video tab in the prior art, addition manually after the completion of needing to image, video tab It is lower to add efficiency.

Apply for content

This application provides a kind of information processing method, information processing unit, storage medium and electronic equipments, for solving It adds the lower technical problem of video tab efficiency manually in the prior art, video tab can be added automatically.

In order to solve the above-mentioned technical problem, embodiments herein adopts the technical scheme that

A kind of information processing method, comprising:

Obtain target data, wherein the target data includes image data and audio data；

Keyword search is carried out to the audio data；

At least one keyword in the audio data is obtained according to preset condition, is formed accordingly based on the keyword The first label；

It, can be to described to pass through first label by first label addition in the image of the target data Target data carries out the first operation.

In some embodiments, described to obtain at least one keyword in the audio data according to preset condition, packet It includes:

In the case where including the first keyword in detecting the audio data, the appearance of first keyword is judged Whether frequency meets predeterminated frequency threshold value；

If it is, determining that first keyword is effective keyword, and obtain first keyword.

It is in some embodiments, described that corresponding first label is formed based on the keyword, comprising:

First first keyword detected in special time period and corresponding first time point are obtained, wherein described Special time period is that the frequency of occurrences of first keyword meets the predeterminated frequency threshold value corresponding period；

Corresponding first label is formed based on first first keyword and the first time point.

In some embodiments, described to obtain at least one keyword in the audio data, base according to preset condition Corresponding first label is formed in the keyword, comprising:

In the case where including the second keyword in detecting the audio data, determine comprising second keyword The start time point that audio data section plays, wherein the audio data section is the continuous audio data of the same user of acquisition；

Corresponding first label is formed based on second keyword and the start time point.

It is in some embodiments, described that keyword search is carried out to the audio data, comprising:

Identify the corresponding text information of the audio data；

It whether detects in the text information including the preset keyword in preset set of keywords.

In some embodiments, the acquisition target data includes:

Obtain the target data generated in the first electronic equipment and the second electronic equipment video call process；

Or

Obtain the target data in preset memory areas.

In some embodiments, described to include: to audio data progress keyword search

Keyword search is carried out to the audio data while obtaining the target data；Or

Keyword search is carried out to the audio data according to the first instruction.

A kind of information processing unit, comprising:

Module is obtained, for obtaining target data, wherein the target data includes image data and audio data；

Detection module, for carrying out keyword search to the audio data；

Module is obtained, for obtaining at least one keyword in the audio data according to preset condition, based on described Keyword forms corresponding first label；

Adding module, for adding first label in the image of the target data, to pass through described first Label can carry out the first operation to the target data.

A kind of storage medium is stored with computer program, realizes following steps when loaded and executed:

Keyword search is carried out to the audio data；

A kind of electronic equipment includes at least memory and processor, is stored with executable program on the memory, described Processor realizes following steps when executing the executable program on the memory:

Keyword search is carried out to the audio data；

The beneficial effect of the embodiment of the present application is:

The information processing method of the embodiment of the present invention, by carrying out keyword search to the audio data in target data, At least one keyword in audio data is obtained according to preset condition later, and the keyword based on acquisition forms corresponding the One label, finally by the addition of the first label in the image of target data.Namely the mode based on speech recognition, to target data The speech content of middle personage carries out keyword search, obtains the keyword that can characterize target data content, and be based on the key Word adds the first label, can be realized and adds tagged purpose automatically, is beneficial to improve the addition efficiency of label, artificial to save Cost.

Detailed description of the invention

Fig. 1 is the flow chart of the information processing method of the embodiment of the present application；

Fig. 2 is a kind of scene schematic diagram of specific embodiment of the information processing method of the embodiment of the present application；

Fig. 3 is the process for carrying out keyword search step in the information processing method of the embodiment of the present application to audio data Figure；

Fig. 4 is to obtain keyword in the information processing method of the embodiment of the present application and form the first label step based on keyword A kind of flow chart of rapid embodiment；

Fig. 5 is to obtain keyword in the information processing method of the embodiment of the present application and form the first label step based on keyword The flow chart of rapid another embodiment；

Fig. 6 is the structural block diagram of the information processing unit of the embodiment of the present application；

Fig. 7 is the structural block diagram of the detection module of the information processing unit of the embodiment of the present application；

Fig. 8 is a kind of structural block diagram of embodiment of the formation module of the information processing unit of the embodiment of the present application；

Fig. 9 is the structural block diagram of the another embodiment of the formation module of the information processing unit of the embodiment of the present application；

Figure 10 is the structural block diagram of the electronic equipment of the embodiment of the present application.

Description of symbols:

10- obtains module；20- detection module；21- recognition unit；22- detection unit；30- forms module；31- judgement Unit；32- third acquiring unit；The 4th acquiring unit of 33-；34- first forms unit；35- determination unit；The formation of 36- second Unit；40- adding module；901- memory；902- processor.

Specific embodiment

The various schemes and feature of the application are described herein with reference to attached drawing.

It should be understood that various modifications can be made to the embodiment applied herein.Therefore, description above should not regard To limit, and only as the example of embodiment.Those skilled in the art will expect in the scope and spirit of the present application Other modifications.

The attached drawing being included in the description and forms part of the description shows embodiments herein, and with it is upper What face provided is used to explain the application together to substantially description and the detailed description given below to embodiment of the application Principle.

By the description of the preferred form with reference to the accompanying drawings to the embodiment for being given as non-limiting example, the application's These and other characteristic will become apparent.

It is also understood that although the application is described referring to some specific examples, those skilled in the art Member realizes many other equivalents of the application in which can determine, they have feature as claimed in claim and therefore all In the protection scope defined by whereby.

When read in conjunction with the accompanying drawings, in view of following detailed description, above and other aspect, the feature and advantage of the application will become It is more readily apparent.

The specific embodiment of the application is described hereinafter with reference to attached drawing；It will be appreciated, however, that applied embodiment is only Various ways implementation can be used in the example of the application.Known and/or duplicate function and structure and be not described in detail to avoid Unnecessary or extra details makes the application smudgy.Therefore, applied specific structural and functionality is thin herein Section is not intended to restrictions, but as just the basis of claim and representative basis be used to instructing those skilled in the art with Substantially any appropriate detailed construction diversely uses the application.

This specification can be used phrase " in one embodiment ", " in another embodiment ", " in another embodiment In " or " in other embodiments ", it can be referred to one or more of the identical or different embodiment according to the application.

The embodiment of the present application provides a kind of information processing method, specifically comprises the following steps:

Keyword search is carried out to the audio data；

The application preferred embodiment is described in detail with reference to the accompanying drawing.

Fig. 1 is the flow chart of the information processing method of the embodiment of the present application, shown in Figure 1, the letter of the embodiment of the present application Breath processing method includes the following steps:

S100 obtains target data, wherein the target data includes image data and audio data, such as video class number According to etc..The target data can be the data generated in real time, can also be the target data being stored in preset memory areas, such as be stored in Target data in database.In the specific implementation process, the process for obtaining target data can be for from for acquiring video class number According to video camera or calculator in video class data is deployed into the process of target storage position, can also be for from target storage position Target data is extracted into the process in built-in storage or register.

S200 carries out keyword search to the audio data.Wherein, which matches with image data Audio, the audio data can be the audio recorded while acquisition or production image data, can also be to get image data Afterwards, the audio data of recording is matched.The keyword can be the keyword chosen in advance, and carrying out keyword search to audio data is Whether to include the keyword chosen in advance in detection audio data.The keyword can also be for according to default detected rule acquisition Keyword, carrying out keyword search to audio data is that audio data is detected and obtained according to default detected rule The word that meets default examination criteria, word are as keyword.Such as using the higher word of the frequency of occurrences as keyword.The key detected Word can be one or more identical keywords, can also be multiple and different keywords.

S300 obtains at least one keyword in the audio data according to preset condition, based on the crucial font At corresponding first label.The keyword that is obtained from a segment of audio data may quantity it is more, but and not all meet addition The condition of label.For example, causing same words to continuously repeat appearance because talker's sentence repeats when talking with or explaining and publicising. Although the video content being also possible in the keyword and video data that occur obtaining mismatches alternatively, keyword does not repeat The problems such as.In the specific implementation process, the settable preset condition for being screened to keyword, and according to preset condition pair Keyword is screened, and to obtain the one or more keywords for meeting preset condition, is then based on keyword and is formed accordingly First label, such as using keyword as the title of the first label, or using keyword as label substance of label etc..

S400 can to pass through first label by first label addition in the image of the target data First operation is carried out to the target data.After first label that completes, add tags in the image of target data Corresponding position.First operation can be skip operation, search operaqtion or other operations.Such as in target data playing process In, first label can be shown in the positions such as progress bar or tab bar, user can be by selecting first label or click Directionkeys etc. adjusts the playback progress of target data to the corresponding position of the first label.After label is completed in addition, it can incite somebody to action Target data is uploaded to database, and forms the first label with search term and be associated with, so that user is when using word and search is searched for The first label and corresponding target data can be retrieved, is somebody's turn to do with facilitating user to obtain the target data and play from corresponding position Target data.

Information processing method in the embodiment of the present application carries out keyword search, energy to audio data based on speech recognition Enough automatic acquisitions can characterize the keyword of target data content, and first is added in target data based on the keyword got Label allows users to execute the first operation to target data using first label.Without artificially passing through viewing target data Content carry out artificial definition of keywords, the keyword based on acquisition can add the first label automatically, be beneficial to improve label and add Add efficiency, save cost of labor, label is added suitable for the video data to magnanimity, to improve user experience.

The process of target data is obtained there are many form, is cooperated shown in Fig. 2, in one embodiment, the acquisition target Data can include: obtain the target data generated in the first electronic equipment and the second electronic equipment video call process.This first Electronic equipment and the second electronic equipment can there are many type, such as laptop, tablet computer, smart phone are logical with video Talk about the electronic equipment of function.In this way, can synchronize and adopt in video call process in the first electronic equipment and the second electronic equipment Collect image data and audio data to form target data.Such as in long-distance video teaching process, the electronics of teacher side can be passed through Image data and audio data in equipment acquisition teaching process utilize speech recognition technology to detect audio to form target data Keyword in data forms the first label based on keyword and corresponding time, then adds the first label in number of targets According to image on, later can by target data store in the database for student viewing, can reduce teacher burden, simplify religion Learn the manufacturing process of video.Also the video data in conference process can be acquired by conference system, so in such as teleconference Keyword search is carried out to video data afterwards and adds label, it will be able to fast, easily make TV news.In another reality It applies in example, the target data of acquisition can also be the target data in preset memory areas.The preset memory areas can be electronic equipment Specific memory section in memory can also be the specific memory section in such as database of Video service quotient.The target data can For the video data of the types such as such as film, TV play, documentary film, short-sighted frequency.By taking Teaching Service platform as an example, recorded in teacher It can be placed in preset memory areas after completing video data, the video data in preset memory areas can be obtained automatically by system later And label is added, it is uploaded to Foreground Data library, then for user's viewing.

In some embodiments, keyword search is carried out to the audio data, the target data can be being got Simultaneously with regard to carrying out keyword search to the audio data automatically, the audio data can also be carried out according to the first instruction crucial Word detection, namely start to carry out keyword search to audio data in the case where getting the first instruction.With teleconference system For system, in a kind of situation, can during meeting carries out synchronous acquisition one or more conference terminal video data, with this Meanwhile carrying out keyword search automatically and adding label, to add label to entire video data.In another middle situation, attend a meeting Personnel can send the first instruction to conference system by conference terminal, to refer to only when conference content is related to relevant content Show that conference system starts the audio data in video data to meeting and carries out keyword search, and add label, so as to after the meeting Relevant content is looked back.

Cooperate shown in Fig. 3, it is in some embodiments, described that keyword search is carried out to the audio data, comprising:

S201 identifies the corresponding text information of the audio data；

Whether S202 detects in the text information including the preset keyword in preset set of keywords.

In the specific implementation process, speech recognition technology can be used to identify audio data, to obtain the audio The corresponding complete text information of data.The process can be completed using speech recognition modeling, speech recognition modeling can by pair The model framework of foundation is trained to be formed, wherein the training process includes: preparation training dataset, the training data Collection includes audio data set and corresponding text data set；Using the audio data set as input data, with the text Data set is as the output data training model framework, to form the corresponding Characteristic Vectors of several units such as with word, phoneme Amount, and characteristic vector is stored in database.Audio data is decomposed into speech recognition process several corresponding to word or phoneme Characteristic vector in these audio sections and database is carried out similarity-rough set, to obtain corresponding text information by audio section.

Preset set of keywords can be the word set for including the preset keyword chosen in advance.Preset keyword can be according to required knowledge The type of other target data, the industry, content etc. are chosen.For example, if fruit is special to computer major or machinery The instructional video of industry adds keyword, then can choose preset keyword from computer technical terms or mechanical major vocabulary. In a preferred embodiment, which may include at least one preset keyword sequences, the preset key Word sequence may include the multiple preset keywords being arranged successively chosen from preset text.The preset text can be and number of targets According to the corresponding text of content, such as can subject under discussion want, meeting PowerPoint (PPT) text, teaching material, syllabus, teaching PPT text and drama etc..The chapter title in preset text, paragraph heading, content can be chosen after getting preset text to want The contents such as point are arranged in preset keyword sequences according to content sequencing and are stored in preset keyword as preset keyword It concentrates.It is then based on the preset set of keywords to identify the text information of acquisition, selected keyword can in this way The main contents and sequencing of accurate characterization target data can then make added label more accurate.For example, obtaining When getting the video data of teachers ' teaching, the PPT text of teaching is also obtained, preset set of keywords is formed based on the PPT text, so Keyword search is carried out to the audio data in video data based on the preset set of keywords afterwards, then based on the keyword of acquisition Generate the first label, and the first label be added in the image of target data, in this way, added first label then with PPT The main contents of text and video data are corresponding.

Cooperate shown in Fig. 4, in some embodiments, at least one obtained according to preset condition in the audio data A keyword, comprising:

S301 judges first keyword in the case where in detecting the audio data comprising the first keyword The frequency of occurrences whether meet predeterminated frequency threshold value；

S302, if the frequency of occurrences of the first keyword meets predeterminated frequency threshold value, it is determined that first keyword is Effective keyword, and obtain first keyword.

In the specific implementation process, when the partial content of target data is related to a certain plot, scene or teaching chapters and sections, Audio data would generally repeat in special time period corresponding to the contents of the section and it is high-frequency there is relevant keyword, Such keyword can be relatively accurate characterization the contents of the section.By the way that the frequency of occurrences is met the first of predeterminated frequency threshold value Keyword is determined as effective keyword, can be improved the accuracy of the keyword got, and then improves and to generate and add The accuracy of first label.For example, by taking computer teaching video as an example, it, may be because relating to when telling about central processing unit (CPU) And data are transmitted or work in coordination and refer to GPU between CPU and graphics processor (GPU), but the actually partial video data Relate generally to CPU.After getting two keywords of CPU and GPU, judge whether the frequency of occurrences of CPU and GPU meets default frequency Rate threshold value is judged that the frequency of occurrences of CPU meets predeterminated frequency threshold value, and the frequency of occurrences of GPU does not meet predeterminated frequency threshold value, It then determines that CPU is effective keyword, and obtains this keyword of CPU.

S303 obtains first first keyword detected in special time period and corresponding first time point, Described in special time period be first keyword the frequency of occurrences meet the predeterminated frequency threshold value corresponding period.The The time point that one time point can occur for the first time for the first keyword in special time period.For example, being regarded in one section of computer teaching For frequency in 5 points of 30 seconds to 20 points 30 seconds this periods in, the frequency of occurrences of " CPU classification " meets predeterminated frequency threshold value, And " CPU classification " occur for the first time be 5 points 30 seconds, it is determined that " CPU classification " be effective keyword, obtain " CPU classification " this While a first keyword, " 5 points 30 seconds " this first time point is also obtained.

S304 forms corresponding first label based on first first keyword and the first time point.? , can be using the first keyword as the theme of the first label or main contents after getting the first keyword and first time point, it can Using first time point as the progression time point in target data of the first label.The example in step S303 is such as adopted, it can Form the label of one entitled " CPU classification ", which can add in video data 5 points at 30 seconds.

Cooperate shown in Fig. 5, in some embodiments, at least one obtained according to preset condition in the audio data A keyword forms corresponding first label based on the keyword, comprising:

S305 in the case where in detecting the audio data comprising the second keyword, is determined and is closed comprising described second The start time point that the audio data section of key word plays, wherein the audio data section is the continuant of the same user of acquisition Frequency evidence；

S306 forms corresponding first label based on second keyword and the start time point.

By the way that the time point of the first label to be determined as to the start time point of the speech of same user, it can be avoided and miss use The speech content relevant to second keyword that family is told about, can make the addition of label more accurate.To include multiple meetings For the tele-conferencing system of terminal, after getting the video data of meeting, it can be said based on speech intonation identification different user Audio data is divided into multiple audio data sections by the audio of words, detects such as " CPU in a wherein audio data section In the case where classification " keyword, the start time point of the audio data section comprising " CPU classification " keyword can be further determined that, As 30 points 15 seconds, then produce one entitled " CPU classification " the first label, and by the label addition in video data 30 points at 15 seconds.

Fig. 6 is the structural block diagram of the information processing unit of the embodiment of the present application, shown in Figure 6, the embodiment of the present application Information processing unit includes: to obtain module 10, detection module 20, form module 30 and adding module 40, in which:

Module 10 is obtained for obtaining target data.Wherein, the target data includes image data and audio data, such as Video class data etc..The target data can be the data generated in real time, can also be the target data being stored in preset memory areas, As stored target data in the database.In the specific implementation process, the process for obtaining target data can be for from for acquiring Video class data is deployed into the process of target storage position in the video camera or calculator of video class data, can also be from target Target data is extracted the process in built-in storage or register by storage location.

Detection module 20 is used to carry out keyword search to the audio data.Wherein, which is and image number According to the audio to match, it can also be to obtain which, which can be the audio recorded while acquisition or production image data, To after image data, the audio data of recording is matched.The keyword can be the keyword chosen in advance, close to audio data Whether it includes the keyword chosen in advance that the detection of key word is in detection audio data.The keyword can also be for according to default detection The keyword of Rule, carrying out keyword search to audio data is, and carries out according to default detected rule to audio data It detects and obtains the word for meeting default examination criteria, word as keyword.Such as using the higher word of the frequency of occurrences as keyword.Inspection The keyword measured can be one or more identical keywords, can also be multiple and different keywords.

It forms module 30 to be used to obtain at least one keyword in the audio data according to preset condition, based on described Keyword forms corresponding first label.The possible quantity of the keyword obtained from a segment of audio data is more but and not all Meet and adds tagged condition.For example, causing same words continuously to weigh because talker's sentence repeats when talking with or explaining and publicising It appears again existing.Alternatively, although keyword does not repeat, the video content being also possible in the keyword and video data that occur obtaining The problems such as mismatch.In the specific implementation process, the settable preset condition for being screened to keyword, and according to default Condition screens keyword, to obtain the one or more keywords for meeting preset condition, is then based on keyword and is formed Corresponding first label, such as using keyword as the title of the first label, or using keyword as label substance of label etc..

Adding module 40 is used for by first label addition in the image of the target data, to pass through described first Label can carry out the first operation to the target data.After first label that completes, target data is added tags to Image in corresponding position.First operation can be skip operation, search operaqtion or other operations.Such as in target data In playing process, first label can be shown in the positions such as progress bar or tab bar, user can be by selecting first mark Label or click directionkeys etc. adjust the playback progress of target data to the corresponding position of the first label.Label is completed in addition Afterwards, target data can be uploaded to database, and form the first label with search term and be associated with, so that user is utilizing search term The first label and corresponding target data can be retrieved when retrieval, to facilitate user to obtain the target data and from corresponding position Play the target data.

Information processing unit in the embodiment of the present application carries out keyword search, energy to audio data based on speech recognition Enough automatic acquisitions can characterize the keyword of target data content, and first is added in target data based on the keyword got Label allows users to execute the first operation to target data using first label.Without artificially passing through viewing target data Content carry out artificial definition of keywords, the keyword based on acquisition can add the first label automatically, be beneficial to improve label and add Add efficiency, save cost of labor, label is added suitable for the video data to magnanimity, to improve user experience.

The specific structure of module 10 is obtained there are many form, in one embodiment, the acquisition module 10 includes: the One acquiring unit, the first acquisition unit are generated for obtaining in the first electronic equipment and the second electronic equipment video call process Target data；First electronic equipment and the second electronic equipment can there are many types, such as laptop, tablet computer, intelligence The electronic equipments with video call function such as energy mobile phone.In this way, carrying out video in the first electronic equipment and the second electronic equipment In communication process, can synchronous acquisition image data and audio data to form target data.Such as in long-distance video teaching process, Can by the electronic equipment of teacher side acquire teaching process in image data and audio data to form target data, to target After data add label, student can be shared with or shared to network for viewing, teacher's burden is can reduce, simplify instructional video Manufacturing process.Also the video data in conference process can be acquired by conference system, then to view in such as teleconference Frequency is according to progress keyword search and adds label, it will be able to fast, easily make TV news.In another embodiment In, the acquisition module 10 can include: second acquisition unit, the second acquisition unit are used to obtain the target in preset memory areas Data.The preset memory areas can be the specific memory section in the memory of electronic equipment, can also be the number of such as Video service quotient According to the specific memory section in library.The target data can be the video counts of the type such as film, TV play, documentary film, short-sighted frequency According to.By taking Teaching Service platform as an example, it can be placed in preset memory areas after teacher records completion video data, it later can be by being The automatic video data obtained in preset memory areas of system simultaneously adds label, is uploaded to Foreground Data library, then for user's viewing.

In some embodiments, the detection module 20 can be while obtaining the target data to the audio data Carry out keyword search；Keyword search can also be carried out to the audio data according to the first instruction, namely get first Start to carry out keyword search to audio data in the case where instruction.It, can be in meeting in a kind of situation by taking tele-conferencing system as an example The video data of synchronous acquisition one or more conference terminal carries out keyword search at the same time automatically during view carries out And label is added, to add label to entire video data.In another middle situation, personnel participating in the meeting can only be related in conference content When relevant content, the first instruction is sent to conference system by conference terminal, to indicate that conference system starts the view to meeting Audio data of the frequency in carries out keyword search, and adds label, to look back after the meeting relevant content.

Cooperate shown in Fig. 7, in some embodiments, the detection module 20 includes: recognition unit 21 and detection unit 22, wherein the corresponding text information of the audio data for identification of recognition unit 21；Detection unit 22 is described for detecting Whether including the preset keyword in preset set of keywords in text information.

In the specific implementation process, recognition unit 21 can be used speech recognition technology and identify to audio data, to obtain Take the corresponding complete text information of the audio data.Recognition unit 21 can complete the process using speech recognition modeling, Speech recognition modeling can be formed by being trained to the model framework of foundation, wherein the training process includes: to prepare training Data set, the training dataset include audio data set and corresponding text data set；Using the audio data set as Input data, the training model framework using the text data set as output data, to form several and word, phoneme etc. The corresponding characteristic vector of unit, and characteristic vector is stored in database.If audio data is decomposed into speech recognition process Characteristic vector in these audio sections and database is carried out similarity-rough set by the dry audio section for corresponding to word or phoneme, thus Obtain corresponding text information.

Cooperate shown in Fig. 8, in some embodiments, the formation module 30 includes: that judging unit 31 and third obtain list Member 32 judges described first in the case that judging unit 31 is used in detecting the audio data comprising the first keyword Whether the frequency of occurrences of keyword meets predeterminated frequency threshold value；Third acquiring unit 32 is used for the appearance in first keyword In the case that frequency meets the predeterminated frequency threshold value, determine that first keyword is effective keyword, and obtain described the One keyword.

In some embodiments, the formation module 30 further include: the 4th acquiring unit 33 and first forms unit 34.The Four acquiring units 33 are used to obtain first first keyword detected in special time period and corresponding first time point, Wherein the special time period is that the frequency of occurrences of first keyword meets the predeterminated frequency threshold value corresponding period； First formation unit 34 is used to be formed corresponding first mark based on first first keyword and the first time point Label.The time point that first time point can occur for the first time for the first keyword in special time period.Getting the first keyword After first time point, can using the first keyword as the theme of the first label or main contents, can using first time point as The progression time point in target data of first label.For example, in one section of computer teaching video data 5 points 30 seconds extremely In 20 points of 30 seconds this periods, the frequency of occurrences of " CPU classification " meets predeterminated frequency threshold value, and " CPU classification " goes out for the first time Be now 5 points 30 seconds, it is determined that " CPU classification " be effective keyword, obtain " CPU classification " this first keyword while, Also " 5 points 30 seconds " this first time point is obtained.Later, the label of one entitled " CPU classification " can be formed, which can add 5 points are added in video data at 30 seconds.

Cooperate shown in Fig. 9, in some embodiments, the formation module 30 comprises determining that unit 35 and second forms list Member 36 determines in the case that determination unit 35 is used in detecting the audio data comprising the second keyword comprising described The start time point that the audio data section of second keyword plays, wherein the audio data section is the same user of acquisition Continuous audio data；Second formation unit 36 is used to form corresponding institute based on second keyword and the start time point State the first label.By the way that the time point of the first label to be determined as to the start time point of the speech of same user, mistake can be avoided The speech content relevant to second keyword that user tells about is crossed, the addition of label can be made more accurate.To include multiple For the tele-conferencing system of conference terminal, after getting the video data of meeting, different use can be identified based on speech intonation The audio of family speech, is divided into multiple audio data sections for audio data, detects for example in a wherein audio data section In the case where " CPU classification " keyword, the initial time of the audio data section comprising " CPU classification " keyword can be further determined that Point, such as 30 points 15 seconds, then produce one entitled " CPU classification " the first label, and by the label addition in video counts According to 30 points at 15 seconds.

The embodiment of the present application also provides a kind of storage mediums, are stored with computer program, when a computer program is executed The method for realizing that the application any embodiment provides illustratively includes the following steps:

S100 obtains target data, wherein the target data includes image data and audio data；

S200 carries out keyword search to the audio data；

S300 obtains at least one keyword in the audio data according to preset condition, based on the crucial font At corresponding first label；

S400 can to pass through first label by first label addition in the image of the target data First operation is carried out to the target data.

When at least one the keyword step obtained according to preset condition in the audio data for executing computer program When, specifically it is executed by processor following steps: in the case where in detecting the audio data comprising the first keyword, judgement Whether the frequency of occurrences of first keyword meets predeterminated frequency threshold value；If it is, determining that first keyword is to have Keyword is imitated, and obtains first keyword.

When executing when forming corresponding first labelling step based on the keyword of computer program, specifically by processor It executes following steps: obtaining first first keyword detected in special time period and corresponding first time point, Described in special time period be first keyword the frequency of occurrences meet the predeterminated frequency threshold value corresponding period；Base Corresponding first label is formed in first first keyword and the first time point.

When at least one keyword obtained according to preset condition in the audio data for executing computer program, it is based on When the keyword forms corresponding first labelling step, it is specifically executed by processor following steps: detecting the audio Comprising in the case where the second keyword, determining the initial time that the audio data section comprising second keyword plays in data Point, wherein the audio data section is the continuous audio data of the same user of acquisition；Based on second keyword and described Start time point forms corresponding first label.

When executing when carrying out keyword search step to the audio data of computer program, specifically it is executed by processor Following steps: the corresponding text information of the audio data is identified；Whether detect in the text information includes preset key The preset keyword that word is concentrated.

When executing the acquisition target data step of computer program, it is specifically executed by processor following steps: obtaining the The target data generated in one electronic equipment and the second electronic equipment video call process；Or the target in acquisition preset memory areas Data.

When executing when carrying out keyword search step to the audio data of computer program, specifically it is executed by processor Following steps: keyword search is carried out to the audio data while obtaining the target data；Or according to the first instruction Keyword search is carried out to the audio data.

Above-mentioned storage medium can be set in the electronic equipment for including at least memory, processor, with the shape of memory Formula exists, and details are not described herein again for specific implementation.

The embodiment of the present application also provides a kind of electronic equipment, shown in Figure 10, which includes at least storage Device 901 and processor 902, executable program is stored on memory 901, and processor 902 is executing holding on memory 901 The method that the application any embodiment provides is realized when line program, illustratively, steps are as follows for executable program:

S200 carries out keyword search to the audio data；

What processor 902 stored on executing memory 901 obtains in the audio data at least according to preset condition When the executable program of one keyword, it is implemented as follows step: crucial comprising first in detecting the audio data In the case where word, judge whether the frequency of occurrences of first keyword meets predeterminated frequency threshold value；If it is, described in determining First keyword is effective keyword, and obtains first keyword.

Processor 902 stored on executing memory 901 based on the keyword form corresponding first label can When executing program, it is implemented as follows step: obtaining first first keyword detected in special time period and correspondence First time point, wherein the special time period be first keyword the frequency of occurrences meet the predeterminated frequency threshold value The corresponding period；Corresponding first label is formed based on first first keyword and the first time point.

What processor 902 stored on executing memory 901 obtains in the audio data at least according to preset condition One keyword is implemented as follows step when forming the executable program of corresponding first label based on the keyword: It detects in the audio data comprising determining the audio data section comprising second keyword in the case where the second keyword The start time point of broadcasting, wherein the audio data section is the continuous audio data of the same user of acquisition；Based on described Two keywords and the start time point form corresponding first label.

What processor 902 stored on executing memory 901, which carries out keyword search to the audio data, can be performed When program, it is implemented as follows step: identifying the corresponding text information of the audio data；Detecting in the text information is The no preset keyword including in preset set of keywords.

For processor 902 in the executable program of the acquisition target data stored on executing memory 901, specific implementation is such as Lower step: the target data generated in the first electronic equipment and the second electronic equipment video call process is obtained；Or it obtains preset Target data in memory block.

What processor 902 stored on executing memory 901, which carries out keyword search to the audio data, can be performed When program, it is implemented as follows step: keyword search being carried out to the audio data while obtaining the target data； Or keyword search is carried out to the audio data according to the first instruction.

Above embodiments are only the exemplary embodiment of the application, are not used in limitation the application, the protection scope of the application It is defined by the claims.Those skilled in the art can make respectively the application in the essence and protection scope of the application Kind modification or equivalent replacement, this modification or equivalent replacement also should be regarded as falling within the scope of protection of this application.

Claims

1. a kind of information processing method, comprising:

Keyword search is carried out to the audio data；

At least one keyword in the audio data is obtained according to preset condition, forms corresponding based on the keyword One label；

By first label addition in the image of the target data, with can be to the target by first label Data carry out the first operation.

2. information processing method according to claim 1, wherein described to be obtained in the audio data according to preset condition At least one keyword, comprising:

In the case where including the first keyword in detecting the audio data, the frequency of occurrences of first keyword is judged Whether predeterminated frequency threshold value is met；

3. information processing method according to claim 2, wherein described to form corresponding first mark based on the keyword Label, comprising:

First first keyword detected in special time period and corresponding first time point are obtained, wherein described specific Period is that the frequency of occurrences of first keyword meets the predeterminated frequency threshold value corresponding period；

4. information processing method according to claim 1, wherein described to be obtained in the audio data according to preset condition At least one keyword, corresponding first label is formed based on the keyword, comprising:

In the case where including the second keyword in detecting the audio data, the audio comprising second keyword is determined The start time point that data segment plays, wherein the audio data section is the continuous audio data of the same user of acquisition；

5. information processing method according to claim 1, wherein it is described that keyword search is carried out to the audio data, Include:

Identify the corresponding text information of the audio data；

6. information processing method according to claim 1, wherein the acquisition target data includes:

Obtain the target data generated in the first electronic equipment and the second electronic equipment video call process；Or

Obtain the target data in preset memory areas.

7. information processing method according to claim 1, wherein described to carry out keyword search packet to the audio data It includes:

8. a kind of information processing unit, comprising:

Detection module, for carrying out keyword search to the audio data；

Module is obtained, for obtaining at least one keyword in the audio data according to preset condition, is based on the key Word forms corresponding first label；

Adding module, for adding first label in the image of the target data, to pass through first label The first operation can be carried out to the target data.

9. a kind of storage medium is stored with computer program, realizes following steps when loaded and executed:

Keyword search is carried out to the audio data；

10. a kind of electronic equipment includes at least memory and processor, is stored with executable program on the memory, described Processor realizes following steps when executing the executable program on the memory:

Keyword search is carried out to the audio data；