CN111935552A

CN111935552A - Information labeling method, device, equipment and medium

Info

Publication number: CN111935552A
Application number: CN202010751320.6A
Authority: CN
Inventors: 王云刚
Original assignee: Anhui Hongcheng Opto Electronics Co Ltd
Current assignee: Anhui Hongcheng Opto Electronics Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-11-13

Abstract

The application discloses an information labeling method, an information labeling device, information labeling equipment and an information labeling medium, and belongs to the technical field of information processing. The method comprises the following steps: acquiring a first file to be processed; the first file to be processed is an audio file or a video file; identifying target content information in a first file to be processed, and recording a target playing progress corresponding to the target content information; inquiring key information corresponding to the target content information in a first preset information base to obtain first key information; and marking the first key information at the target playing progress to obtain a first target file. The method and the device do not need manual processing, simplify the operation process of workers, reduce the workload during labeling and improve the convenience of the information labeling process.

Description

Information labeling method, device, equipment and medium

Technical Field

The application belongs to the technical field of information processing, and particularly relates to an information labeling method, device, equipment and medium.

Background

With the popularization of electronic devices, people start to record and watch files in more and more scenes, such as classroom content.

Due to the fact that the playing time of the file is long, a user can conveniently and quickly find the interested part in the file, and the key information in the file is often required to be marked.

However, in the related art, most of the related technologies adopt a manual labeling mode to label the key information in the file, however, in the manual labeling mode, the workload of the worker is large, the operation is complicated, the worker is required to have a higher professional level, and the convenience is not sufficient.

Disclosure of Invention

An object of the embodiments of the present application is to provide an information labeling method, apparatus, device, and medium, which can solve the problems of large workload and insufficient convenience when labeling a file.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides an information labeling method, where the method includes:

acquiring a first file to be processed; the first file to be processed is an audio file or a video file;

identifying target content information in the first file to be processed, and recording a target playing progress corresponding to the target content information;

inquiring key information corresponding to the target content information in a first preset information base to obtain first key information;

and marking the first key information at the target playing progress to obtain a first target file.

In a second aspect, an embodiment of the present application provides an apparatus for information annotation, where the apparatus includes:

the acquisition module is used for acquiring a first file to be processed; the first file to be processed is an audio file or a video file;

the identification module is used for identifying the target content information in the first file to be processed and recording the target playing progress corresponding to the target content information;

the first query module is used for querying key information corresponding to the target content information in a first preset information base to obtain first key information;

and the marking module is used for marking the first key information at the target playing progress to obtain a first target file.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, in the process of playing the first file to be processed, the target content information in the first file to be processed is identified, the target playing progress corresponding to the target content information is recorded, the first key information corresponding to the content information is inquired in the first preset information base, and then the first key information is marked at the target playing progress corresponding to the target content information. Therefore, according to the embodiment, automatic identification and query of the first key information can be performed on the content information in the file, and then automatic labeling is performed on each part of the first file to be processed, and manual processing is not needed in the process, so that the operation process of workers is simplified, the workload during labeling is reduced, and the convenience in the information labeling process is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.

Fig. 1 is a schematic flowchart of an information annotation method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another information annotation method provided in the embodiment of the present application;

fig. 3 is a schematic structural diagram of an information annotation device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an information annotation system according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

For example, in the background art, when audio and video are labeled in the related art, manual labeling is mainly adopted, for example, knowledge points are labeled on a recorded classroom video, and this method needs to manually determine key information to be labeled and complete labeling actions, which results in large workload and complex operation.

In order to solve the above technical problem, an information labeling method is provided in the embodiments of the present application, and the information labeling method provided in the embodiments of the present application is described in detail through specific embodiments and application scenarios thereof with reference to the accompanying drawings. Referring to fig. 1, fig. 1 is a schematic flow chart of an information labeling method provided in an embodiment of the present application. The method can be applied to electronic equipment, where the electronic equipment can include a mobile phone, a computer, a wearable device, and the like, and the application is not limited thereto. The method comprises the following steps:

s110, acquiring a first file to be processed.

The first file to be processed may be an audio file or a video file. The obtaining mode here may include: and acquiring a currently recorded file, or acquiring a file prestored in the electronic equipment, and the like.

And S120, identifying the target content information in the first file to be processed, and recording the target playing progress corresponding to the target content information.

The target content information here refers to the entire content identified from the target portion (e.g., the target video frame or a piece of audio) in the first file to be processed. The target content information may include textual information and/or voice information. The target playing progress here refers to a corresponding position of the content information on the playing progress bar, and may also be understood as a coordinate on the playing time axis of the first file to be processed.

The target content information and the corresponding target playing progress in the background data can be identified and recorded in the process of playing the first file to be processed, or the background data of the first file to be processed can be directly acquired to identify the target content information and the corresponding target playing progress in the background data. The specific method is adopted, and the application is not limited.

The specific implementation of this step will be described in detail later.

S130, key information corresponding to the target content information is inquired in the first preset information base to obtain first key information.

In the first preset information base, some key information and content information corresponding to each key information are preset. The query operation may specifically include: and comparing the target content information identified in the first file to be processed with the content information corresponding to each key information in the first preset information base, inquiring the content information which is consistent with the target content information in comparison, and taking the key information related to the inquired content information as the first key information.

For example, the first preset information base is provided with: the chapter names of all chapters of the language textbook at the grade of the year and the chapter contents corresponding to the chapter names are the key information, and the chapter contents are the content information corresponding to the key information. And comparing the target content information identified in the first file to be processed with the chapter content in the first preset information base, inquiring the chapter content which is consistent with the target content information in comparison, wherein the chapter name related to the inquired chapter content is the first key information corresponding to the target content information.

In addition, in some embodiments, part of the content information and the corresponding key information thereof may also be stored locally, and subsequently, whether the first key information corresponding to the target content information is included may be queried locally first, and under the condition that the first key information is not included locally, the query is performed through the first preset information base.

S140, marking the first key information at the target playing progress to obtain a first target file.

The annotation refers to adding the first key information to the first file to be processed at the target playing progress, so that the user can view or hear the first key information on the playing progress bar when the first target file is played later.

Optionally, the label here may be a dotting, that is, a dotting is set at a target playing progress on a progress bar (i.e., a time axis) of the first to-be-processed file, the dotting is associated with the first key information, and when the user mouse is placed at the dotting position, the first key information associated with the dotting is displayed. Alternatively, the labeling here may be to label text or insert voice information in a certain frame image in the first file to be processed.

In some embodiments, in the case that the first file to be processed is a video file, S120 may include:

s1211, obtaining the video frames in the first file to be processed every preset time length.

The preset time period may be set by the user, and may be set to 5s, for example, that one frame of video is acquired every 5 s. Optionally, the acquired video frame may specifically be an original image of the video frame, or may also be a screenshot of the video frame.

In addition, since the number of video frames played per second is fixed, one frame of video frame may be acquired after every preset number of video frames (for example, every 30 frames). The mode of periodically acquiring the video frames can avoid omission of key information as much as possible under the condition of reducing the identification amount of content information needing to be identified.

S1213, identifying the target character information in the video frame, and taking the target character information as the target content information.

Specifically, the recognition may be performed by an image recognition technique. Alternatively, the image Recognition technology may be an Optical Character Recognition (OCR) technology. OCR technology is a technology capable of converting an image into an editable document, in which an electronic device determines the shape of characters in the image by detecting dark and light patterns, and then translates the shape into computer characters by a character recognition method. Of course, other techniques for recognizing the text information in the image may be used, and the present application is not limited thereto.

S1215, recording the playing progress corresponding to the video frame as the target playing progress corresponding to the target content information

In this embodiment, through an image recognition method, target text information in an acquired video frame can be recognized, so that content displayed by the video frame is determined, and what first key information displayed at a playing position corresponding to the video frame is further determined, for example, what knowledge point is displayed at the playing position corresponding to the video frame, and then the first file to be processed is marked to obtain a first target file marked with the first key information at a target playing progress, so that when a subsequent user watches the first target file, the content played by each part in the first target file can be quickly determined according to the marked first key information, and thus the user can conveniently and quickly find out interested parts for watching.

Optionally, after the video frame is acquired, the video frame may be stored, and the stored video frame file may be named according to the relative time. For example, assuming that one frame of video frame is acquired every 5s, that is, the 0 th, 5 th, 10 th, 15 th, 20 th, and 25 th video frames … … are acquired, the names of the video frame files may be 0.jpeg, 5.jpeg, 10.jpeg, 15.jpeg, 20.jpeg, and 25.jpeg … …, respectively. And then, the obtained target text information can be stored, the file name corresponding to the target text information can be bound with the time of the video frame, and the binding relationship can be embodied by the file name of the stored target text information. For example, it may be: 0_ pic _ key, 5_ pic _ key, 10_ pic _ key, 15_ pic _ key, 20_ pic _ key, 25_ pic _ key, and the like. The first number is expressed as a target playing progress corresponding to the target text information, that is, a time axis coordinate of a video frame corresponding to the target text information. Of course, the above is only a specific naming mode, and the naming mode of the video frame file and the file after the target text information is stored is not limited in the present application.

In other embodiments, the S120 may include:

and S1221, identifying the audio information in the first file to be processed to obtain a target sentence.

In particular, Recognition can be performed by Speech Recognition techniques, also known as Automatic Speech Recognition (ASR), which aim at converting the vocabulary content of human Speech into computer-readable input information.

And S1223, performing statement analysis on the target statement to obtain target voice information serving as the target content information.

The purpose of the sentence analysis is to identify valid information in the target sentence. The sentence analysis mode may include extracting feature words in the target sentence, removing links that do not affect the semantic meaning much in the target sentence, and the sentence analysis may also include performing semantic analysis on the target sentence, performing classification statistics on words in the target sentence, and the like.

And S1225, recording the playing progress corresponding to the target sentence as the target playing progress corresponding to the target content information.

Since the target sentence in the audio information is analyzed in this embodiment, the playing progress corresponding to the target sentence may be a time axis coordinate corresponding to the starting time of the target sentence.

In this embodiment, the first file to be processed may be a video file or an audio file, and since the audio file or the video file both include voice information, the voice information can be identified by using a voice technology, so as to determine the content of the first file to be processed. Such as the points of knowledge taught, etc. In the process of voice recognition, different sentences can be recognized, and each sentence corresponds to different playing progress.

Optionally, as in the foregoing embodiment, after the target voice information is identified, the target voice information may be further stored, and during storage, the target voice information and the corresponding target playing progress may be associated and bound, and the binding relationship may be embodied by a file name of the stored target voice information. For example, the file name of the saved target voice information may be: key _0_5, key _1_9, key _2_12, key _3_20, key _4_21, key _5_22 … …. The first number in the file name indicates a label or id of the target voice message, and is used for distinguishing different target voice messages, and the second number in the file name indicates a time axis coordinate corresponding to the start time of the target voice message, that is, the target voice message corresponds to data that starts playing in the first file to be processed for the second time, for example, key _0_5 means: the target speech information labeled 0 is played at the 5 th time. Of course, the above is only a specific naming method, and the present application does not limit the naming method of the file after the target voice information is saved.

Optionally, after the first key information is queried, the first key information may also be recorded and stored, and when the first key information is stored, the first key information needs to be associated with a target playing progress corresponding to the first key information, for example, when the first key information is stored, information of the target playing progress corresponding to the first key information may be added to a stored file name. For example, assuming that the video frames of 0s, 5s, 10s, 15s, 20s, and 25s … … are respectively truncated to obtain the corresponding first key information, the corresponding first key information of each video frame may be recorded as: 0_ pic _ result, 5_ pic _ result, 10_ pic _ result, 15_ pic _ result, 20_ pic _ result, and 25_ pic _ result … …, wherein the first number represents the target playing progress corresponding to the first key information, i.e. the time axis coordinate, for example, the 5_ pic _ result is the first key information in the video frame played at the 5 th s. Or, the first key information corresponding to each frame of video may also be recorded as: key _1_9_ result, key _2_12_ result, key _3_20_ result, key _4_21_ result, and key _5_22_ result … …, wherein a first number indicates a reference number or id of the first key information, and is used to distinguish different first key information, and a second number in the file name indicates a time axis coordinate corresponding to a start time of the first key information, that is, the first key information corresponds to data that starts playing in the first file to be processed for the second seconds, for example, key _1_9_ result means: the first key information numbered 1 is played at 9 s. Of course, the above is only a specific naming mode, and the present application does not limit the naming mode of the file after the first key information is saved.

In addition, the recorded first key information can be saved in a folder with the same name as the first file to be processed.

In order to simplify the size of the desired target file and enable the finally obtained target file to include only the content corresponding to the first key information, in further embodiments of the present application, after S140, the method may further include:

and S151, determining a first start-up time corresponding to the first key information according to the target playing progress corresponding to the first key information.

Since one piece of first key information corresponds to a period of playing content, for example, if the first key information is a knowledge point, one knowledge point corresponds to a period of explanation, according to the first key information corresponding to each target playing progress, the start-stop time corresponding to the same piece of first key information can be determined, for example, the start-stop time of the knowledge point a is 12 minutes and 12 seconds to 13 minutes and 12 seconds.

In some embodiments, the time axis coordinate corresponding to the end time of the target sentence may be determined by any one of the following manners: one way is to take the coordinates after a preset time period after the start time of the target sentence as the time axis coordinates corresponding to the end time of the target sentence, and for example, assuming that each sentence is approximately 30s, the position of 30s after the start time is taken as the end time of the target sentence. Another method is to recognize an end word in the target sentence and use the position of the end word as the time axis coordinate of the end time of the target sentence. Alternatively, the time axis coordinate corresponding to the end time of the target sentence may be determined in another manner. And according to the time axis coordinates corresponding to the starting time and the ending time, the playing progress section corresponding to the target sentence can be obtained.

S153, cutting out a data segment corresponding to the first start-stop time in the first target file to obtain a second target file; wherein, under the condition that the first target file is an audio file, the data segment is an audio segment, and under the condition that the first target file is a video file, the data segment is a video segment.

After the first starting time is determined, the first target file can be cut according to the first starting time, a section of data segment corresponding to each piece of first key information is cut, and the data segments corresponding to the cut pieces of first key information can be spliced according to the time sequence, so that a second target file is obtained.

In this embodiment, the first file to be processed is cut, that is, a part of the data segment corresponding to the first key information is cut and spliced, so that the second target file only includes the data segment corresponding to the first key information. For example, for courseware recorded in a class, only part of contents related to knowledge points can be cut out for splicing, and in this case, the finally obtained second target file only contains the part of the knowledge points, so that the reading efficiency of subsequent users is facilitated. Specifically, when the first ending time is determined, that is, according to the first key information corresponding to each target playing progress, for example, the first key information corresponding to 3 consecutive target playing progresses is knowledge point a, the starting time corresponding to the knowledge point a at this time is the first target playing progress, and the ending time corresponding to the knowledge point a is the third target playing progress.

In other embodiments, after S140, the method may further include:

and S161, uploading the first target file and the corresponding first key information to a preset audio and video platform, so that the preset audio and video platform stores the first target file in a group corresponding to the first key information.

In this embodiment, a first target file is uploaded to a preset audio/video platform for storage, a group corresponding to each first key information is preset in the preset audio/video platform, for example, if the first key information is knowledge point information, groups corresponding to different knowledge points are set in the preset audio/video platform, and after the first target file and the corresponding first key information are uploaded, the preset audio/video platform can bind the id of the first target file and the id of the corresponding knowledge point group, so that the first target file is stored in the corresponding group. Under the condition, the user can conveniently select a type of files which the user wants to check according to the name of each group, and the convenience of the user in searching the files is improved.

Optionally, the process when the user searches for the file may include:

receiving a first input of a user to a target group in a preset audio and video platform; the first input may be a click input operation on the target group, and the like, which is not limited in the present application.

In response to the first input, a list of files saved within the target grouping is displayed.

Receiving a second input of a user to a third file in the file list; the second input may be a click input operation on the target group, and the like, which is not limited in the present application.

And responding to the second input, and displaying a third file returned by the preset audio and video platform.

In this embodiment, the user may first select a type of file that the user wants to view according to the name of each group, and then select a file that the user wants to view according to the file list stored in the group, which is highly convenient for the user to search for the file.

In other embodiments, after S140, the method may further include:

s171, the first feature information corresponding to the target content information is searched for in the second preset information base.

The second preset information base and the first preset information base can be the same information base or different information bases. In the second preset information base, feature information corresponding to each piece of content information is recorded, and the feature information is information for representing important features in the content information. For example, the content information is all contents included in the knowledge point a, and the feature information is important features corresponding to the knowledge point a, such as a functional relation, a definition, and the like. The query operation may compare the target content information with the content information recorded in the second preset information base, find content information identical to the target content information, and obtain first feature information associated with the content information.

S173, uploading the first target file and the corresponding first characteristic information to a preset audio and video platform; and the preset audio and video platform stores the first target file and the corresponding first characteristic information in an associated manner.

In this embodiment, content information and feature information corresponding to each piece of key information are pre-entered in the second preset information base, and a user can find corresponding first feature information according to the target content information, and then upload the first feature information to the preset audio/video platform as an attribute of the first target file, so that a subsequent user can conveniently find the first target file.

In a further embodiment, the process of the user finding the file may include:

target characteristic information input by a user is received.

And uploading the target characteristic information to a preset audio and video platform.

And displaying a fourth file returned by the preset audio and video platform.

And the fourth file is a file which is stored in a preset audio and video platform and is matched with the corresponding characteristic information and the target characteristic information.

In this embodiment, the user can accurately search for the file by inputting the target characteristic information, so that convenience and accuracy of the user in searching for the file are improved.

Because a part of the files are longer or contain more invalid information, for example, if the first file to be processed is a recorded classroom video, the first file may contain a part of waiting time before the class, because the part of the invalid information is not important and occupies a larger storage space, and the part of the content information is also identified in subsequent identification, some invalid identification operations are caused.

Based on this, in further embodiments of the present application, referring to fig. 2, fig. 2 is a schematic flowchart of another information annotation method provided in the embodiments of the present application. The method can comprise the following steps:

s210, acquiring a first file to be processed.

S210 is similar to S110 in fig. 1, and is not described herein again.

S220, determining a target data segment with the volume value smaller than a preset volume threshold value in the first file to be processed;

the purpose of this part is to find out the part of the data segment with smaller volume value in the first file to be processed, and since this part of the data segment does not contain valid speech information, deleting it can shorten the length of the first file to be processed and reduce the content information to be recognized. For example, a data segment whose volume value does not satisfy 20db may be first taken as a target data segment, and so on. The specific content of the preset volume threshold is not limited in the present application.

And S230, deleting the target data segment in the first file to be processed.

The first file to be processed can be cut from the starting and ending positions of the target data segment, and the cut target data segment can be deleted.

S240, splicing the residual data segments in the first file to be processed according to the time sequence to obtain a second file to be processed.

After the target data segment is deleted, the remaining data segments in the first file to be processed are disconnected into a plurality of data segments, and in this case, in order to integrate the dispersed data segments and ensure the time sequence during subsequent playing, the data segments need to be spliced according to the time sequence to obtain a complete file.

And S250, identifying the content information in the second file to be processed, and taking the content information as target content information.

S260, determining the corresponding playing progress of the target content information in the first file to be processed, and recording the playing progress as the target playing progress.

In the case where the first file to be processed is a video file, although the target data segment does not contain valid voice information, this portion may also be valid content, for example, the first file to be processed is a video recorded in a classroom, the target data segment may be a portion of a video in which a teacher writes on a blackboard, and this portion of the video is also valid. Therefore, in the embodiment, although the data segment of each part is deleted in the process of identifying the first key information of each part, when labeling is performed subsequently, the first key information should be labeled at the target playing progress in the first file to be processed, so the target playing progress should refer to the playing progress of the target content information in the first file to be processed.

In some embodiments, the determining the playing progress of the target content information in the first file to be processed may include: recording start and stop time axis coordinates of the target data segment in the first file to be processed, and after identifying the target content information, determining a first time axis coordinate corresponding to the target content information after being reduced to the first file to be processed according to a second time axis coordinate of the target content information in the second file to be processed, wherein the first time axis coordinate is a playing progress corresponding to the target content information in the first file to be processed. The first time axis coordinate and the second time axis coordinate may correspond to a start time corresponding to the target content information.

And S270, inquiring key information corresponding to the target content information in a first preset information base to obtain first key information.

S280, marking first key information at a target playing position in the first file to be processed to obtain a first target file. S270 to S280 are similar to S130 to S140 in fig. 1, and are not described herein again.

In the embodiment, the target data segment with a lower volume value in the first file to be processed is determined and deleted, and for the data segment with the lower volume value, the part of information does not contain effective voice information, so that after the part of information is deleted, the subsequent content information needing to be identified can be reduced, and the information labeling process is simplified; and meanwhile, the size of the first file to be processed is also compressed, so that the volume of the first target file is reduced.

In other embodiments, when the first file to be processed is an audio file, all information of the audio file is voice information, and therefore, since the volume value of the target data segment is too low to identify the voice information contained in the target data segment, the target data segment is an invalid data segment at this time, and after the target data segment is deleted to obtain a second file to be processed, the playing progress of the target content information in the second file to be processed can be directly recorded as the target playing progress, and then the first key information is marked to the second file to be processed.

In some embodiments, the S220 specifically includes:

s2201, acquiring second start and stop time corresponding to each volume value in the first file to be processed.

The manner of obtaining the second start-stop time corresponding to each volume value here may include:

and determining a first data segment with the volume value in a target volume value range, wherein the start-stop time of the first data segment is used as a second start-stop time, and each target volume value range corresponds to a volume mean value. The volume value corresponding to the second start/stop time in S2201 may be a volume average value.

For example, the target volume value range may be [20 dB-29 dB ], the start-stop time corresponding to the data segment having the volume value within the range is used as the second start-stop time, and the average volume value corresponding to the target volume value range may be 25 dB. When the average volume value in the first file to be processed is calculated subsequently, the average volume value can be used for calculation. In addition, the second start-stop time here may include a plurality of time periods, that is, the second start-stop time corresponding to the target volume value range refers to a set of all time periods in the first file to be processed in which the volume value is in the above range. The length of time that the second start-stop time encompasses can then be determined. For example, the second start-stop time here may be recorded as: DB _25_0, DB _35_5, and DB _35_10 … …, wherein the first number is a decibel value of the volume average value, and the second number is a time length included in the second start-stop time. Decibel (decibel) is a unit of measure for measuring the ratio of two identical units, and is mainly used to measure the sound intensity, usually expressed in dB.

S2203, calculating an average volume value in the first file to be processed according to the second start-stop time and the volume value corresponding to the second start-stop time.

The specific calculation process may include: and multiplying the volume average value corresponding to each target volume value range by the time length of the second start-stop time corresponding to the target volume value range, summing the products, and finally dividing the summation result by the second total time length of the first file to be processed to obtain the average volume value.

S2205, regarding a time period of the first file to be processed, in which the volume value is smaller than the preset volume threshold, as a blank time period, where the preset volume threshold is a volume value occupying a preset proportion of the average volume value.

For example, the preset volume threshold is 1/2 of the average volume value, for example, if the average volume value is 20db, the preset volume threshold is 10 db, and the time period lower than 10 db is used as the blank time period.

S2207, taking the data segment corresponding to the blank time period with the time length exceeding the preset time length threshold value in the first file to be processed as the target data segment.

For example, if the preset time threshold is 5 minutes, the data segment corresponding to the blank time period exceeding 5 minutes is used as the target data segment, and is deleted.

In this embodiment, the average volume value of the first file to be processed is determined first, and then a part of the time period, which has a volume lower than the average volume value and is longer, is deleted, because valid voice information may not be extracted in the part of the time period, the first file to be processed can be compressed after the part of the time period is removed, so that the compressed second file to be processed contains all sound portions that can be clearly heard by the user.

In the case of obtaining the second target file, the same operation as that performed on the first target file in the foregoing embodiment may be performed, for example, uploading the second target file to a preset audio/video platform. For the operation of the second target file, reference may be made to the aforementioned operation of the first target file, and details are not described herein again.

Optionally, the preset audio/video platform may record which files have been queried, that is, store the query record of the user

In addition, the first preset information base and the second preset information base can correct wrong contents according to prompt information fed back by a user. Specifically, the first preset information base and the second preset information base may perform the following operations, which are described below by taking the first preset information base as an example:

receiving prompt information fed back by a user to target content stored in a first preset information base; learning the prompt information through a deep learning algorithm; and correcting the target content stored in the first preset information base according to the learning result. The method can continuously improve the content in the first preset information base, and ensures the accuracy of the information in the first preset information base.

It should be noted that, in the information labeling method provided in the embodiment of the present application, the execution main body may be an information labeling device, or a control module in the information labeling device, which is used for executing the loaded information labeling method. In the embodiment of the present application, an information labeling device is taken as an example to execute a method for labeling loaded information, and the information labeling method provided in the embodiment of the present application is described.

Based on the foregoing method embodiment, an information annotation apparatus is further provided in the embodiment of the present application, and as shown in fig. 3, fig. 3 is a schematic structural diagram of the information annotation apparatus provided in the embodiment of the present application. The device includes:

an obtaining module 310, configured to obtain a first file to be processed; the first file to be processed is an audio file or a video file;

the identification module 320 is configured to identify target content information in the first file to be processed, and record a target playing progress corresponding to the target content information;

the first query module 330 is configured to query key information corresponding to the target content information in a first preset information base to obtain first key information;

the labeling module 340 is configured to label the first key information at the target playing progress to obtain a first target file.

In some embodiments, in the case that the first file to be processed is a video file, the identifying module 320 may include:

the video frame acquisition unit is used for acquiring video frames in the first file to be processed every preset time length;

the first identification unit is used for identifying target character information in the video frame and taking the target character information as target content information; and recording the playing progress corresponding to the video frame as a target playing progress corresponding to the target content information.

In other embodiments, the identification module 320 may include:

the second identification unit is used for identifying the audio information in the first file to be processed to obtain a target sentence;

the analysis unit is used for carrying out statement analysis on the target sentence to obtain target voice information serving as target content information; and recording the playing progress corresponding to the target sentence as the target playing progress corresponding to the target content information.

In some embodiments, the apparatus may further comprise:

the time determining module is used for determining a first start-up time corresponding to the first key information according to the target playing progress corresponding to the first key information;

the cutting module is used for cutting out a data segment corresponding to the first start-stop time in the first target file to obtain a second target file; and the data segment is an audio segment under the condition that the first target file is an audio file, and the data segment is a video segment under the condition that the first target file is a video file.

In this embodiment, the first file to be processed is cut, that is, a part of the data segment corresponding to the first key information is cut and spliced, so that the second target file only includes the data segment corresponding to the first key information. For example, for courseware recorded in a class, only part of contents related to knowledge points can be cut out for splicing, and in this case, the finally obtained second target file only contains the part of the knowledge points, so that the reading efficiency of subsequent users is facilitated.

In some embodiments, the identifying module 320 may specifically include:

and the data segment determining unit is used for determining the target data segment of which the volume value is smaller than the preset volume threshold value in the first file to be processed.

And the deleting unit is used for deleting the target data segment in the first file to be processed.

And the splicing unit is used for splicing the residual data sections in the first file to be processed according to the time sequence to obtain a second file to be processed.

The identification unit is used for identifying the content information in the second file to be processed and taking the content information as target content information;

and the progress determining unit is used for determining the corresponding playing progress of the target content information in the first file to be processed and recording the playing progress as the target playing progress.

In some embodiments, the data segment determining unit may include:

the time acquisition unit is used for acquiring second start-stop time corresponding to each volume value in the first file to be processed;

the average value calculating unit is used for calculating an average volume value in the first file to be processed according to the second start-stop time and the volume value corresponding to the second start-stop time;

the time period determining unit is used for taking the time period of the first file to be processed, of which the volume value is smaller than the preset volume threshold value, as a blank time period; the preset volume threshold is a volume value which occupies a preset proportion of the average volume value;

and the data segment determining unit is used for taking the data segment corresponding to the blank time segment of which the time length exceeds the preset time length threshold value in the first file to be processed as the target data segment.

In some embodiments, the apparatus may further comprise:

the first uploading module is used for uploading the first target file and the corresponding first key information to the preset audio and video platform, so that the preset audio and video platform stores the first target file in the group corresponding to the first key information.

Under the condition, the user can conveniently select a type of files which the user wants to check according to the name of each group, and the convenience of the user in searching the files is improved.

In other embodiments, the apparatus may further include:

and the second query module is used for querying the first characteristic information corresponding to the target content information in a second preset information base.

The second uploading module is used for uploading the first target file and the corresponding first characteristic information to a preset audio and video platform; and the preset audio and video platform stores the first target file and the corresponding first characteristic information in an associated manner.

The information labeling device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The information labeling device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The information labeling device provided in the embodiment of the present application can implement each process implemented by the information labeling method in the method embodiments of fig. 1 to fig. 2, and is not repeated here to avoid repetition.

In order to solve the above technical problem, an information annotation system according to an embodiment of the present application is provided, referring to fig. 4, fig. 4 is a schematic structural diagram of an information annotation system according to an embodiment of the present application, and the system may include an electronic device 410 and a first preset information base 420.

The electronic device 410 is configured to execute each process in the information labeling method in the foregoing method embodiments in fig. 1 to fig. 2, and complete a labeling action on a file, which is not described herein again to avoid repetition.

Some key information and content information corresponding to each key information are preset in the first preset information base 420, and the subsequent electronic device 410 may query the corresponding first key information in the first preset information base 420 according to the target content information in the first file to be processed, so as to label the first file to be processed according to the first key information. The first preset information library 420 may be implemented by a database server or distributed storage or other similar server or device, or may be provided within the electronic device 410.

For convenience of understanding, in the following video courseware of a language class of a year recorded by taking a first document to be processed as a classroom, each chapter (i.e., key information) of the language class of the year and contents included in each chapter are set in the first preset information base 420, which is described as an example. The electronic device 410 recognizes that the target content information in the first file to be processed is the first chapter. And querying a first preset information base 420 according to the target content information, determining that the chapter name corresponding to the first chapter is ancient poem, and then marking the chapter name ancient poem in the first file to be processed to obtain a first target file.

Of course, the above is only a specific example, the first file to be processed may also be an audio file, or other types of video files, and the content information and the key information stored in the first preset information library 420 are related to the type and content of the first file to be processed. For example, the first file to be processed may be an audio file composed of a plurality of pieces of music, the target content information is the recognized lyric information and/or melody information, etc., the song name corresponding to the lyric information and/or melody information (i.e., the song name is the key information) may be stored in the first preset information library 420, and then the song name corresponding to the target content information may be marked in the first file to be processed.

In some embodiments, the system may further include a preset audio and video platform 430, where the preset audio and video platform 430 is configured to store the first target file or the second target file after the annotation is completed. The preset audio/video platform 430 may be implemented by a cloud server or a web server or other similar servers.

In addition, in other embodiments, the system may further include a second preset information base 440, where some content information and feature information corresponding to the content information are preset in the second preset information base 440, and the subsequent electronic device 410 may query the second preset information base 440 for corresponding first feature information according to the target content information in the first file to be processed, so as to upload the first feature information and the first target file to the preset audio/video platform 430 together. The second preset information library 440 may be implemented by a database server or distributed storage or other similar server or device, or the second preset information library may also be provided in the electronic device 410.

In order to solve the foregoing technical problem, an embodiment of the present application further provides an electronic device, which includes a processor, a memory, and a program or an instruction stored in the memory and executable on the processor, where the program or the instruction is executed by the processor to implement each process of the above information labeling method embodiment, and can achieve the same technical effect, and no further description is provided here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Referring to fig. 5, fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

The electronic device 500 includes, but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and the like.

Those skilled in the art will appreciate that the electronic device 500 may further include a power supply (e.g., a battery) for supplying power to various components, and the power supply may be logically connected to the processor 510 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The input unit 504 may include a graphics processor, a microphone, and the like. The display component 506 may include a display panel. The user input unit 507 may include a touch panel and other input devices, and the like. The memory 509 may store an application program, an operating system, and the like. The electronic device structure shown in fig. 5 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the process of the embodiment of the information labeling method is implemented, and the same technical effect can be achieved, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device in the above embodiment. Readable storage media, including computer-readable storage media, such as Read-Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, etc.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above-mentioned information labeling method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An information labeling method, comprising:

2. The method according to claim 1, wherein, in a case that the first file to be processed is a video file, the identifying target content information in the first file to be processed and recording a target playing progress corresponding to the target content information includes:

acquiring video frames in the first file to be processed every other preset time length;

identifying target character information in the video frame, and taking the target character information as the target content information;

and recording the playing progress corresponding to the video frame as a target playing progress corresponding to the target content information.

3. The method according to claim 1, wherein the identifying the target content information in the first file to be processed and recording the target playing progress corresponding to the target content information comprises:

identifying audio information in the first file to be processed to obtain a target sentence;

performing statement analysis on the target statement to obtain target voice information serving as the target content information;

and recording the playing progress corresponding to the target sentence as the target playing progress corresponding to the target content information.

4. The method according to any one of claims 1-3, wherein after obtaining the first object file, further comprising:

determining a first start-up time corresponding to the first key information according to a target playing progress corresponding to the first key information;

cutting out a data segment corresponding to the first start-stop time in the first target file to obtain a second target file; wherein, under the condition that the first target file is an audio file, the data segment is an audio segment, and under the condition that the first target file is a video file, the data segment is a video segment.

5. The method according to claim 1, wherein the identifying the target content information in the first file to be processed and recording the target playing progress corresponding to the target content information comprises:

determining a target data segment of the first file to be processed, wherein the volume value of the target data segment is smaller than a preset volume threshold;

deleting the target data segment in the first file to be processed;

splicing the residual data segments in the first file to be processed according to a time sequence to obtain a second file to be processed;

identifying content information in the second file to be processed, and taking the content information as the target content information;

and determining the corresponding playing progress of the target content information in the first file to be processed, and recording the playing progress as the target playing progress.

6. The method of claim 1, wherein after obtaining the first object file, further comprising:

and uploading the first target file and the corresponding first key information to a preset audio and video platform, so that the preset audio and video platform stores the first target file in a group corresponding to the first key information.

7. The method according to claim 1, wherein after identifying the target content information in the first file to be processed, the method further comprises:

inquiring first characteristic information corresponding to the target content information in a second preset information base;

uploading the first target file and the corresponding first characteristic information to a preset audio and video platform; and the preset audio and video platform stores the first target file and the corresponding first characteristic information in an associated manner.

8. An information labeling apparatus, comprising:

9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the information annotation method according to any one of claims 1 to 7.

10. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, carry out the steps of the information annotation method according to any one of claims 1 to 7.