WO2017211206A1 - Video marking method and device, and video monitoring method and system - Google Patents

Video marking method and device, and video monitoring method and system Download PDF

Info

Publication number
WO2017211206A1
WO2017211206A1 PCT/CN2017/086325 CN2017086325W WO2017211206A1 WO 2017211206 A1 WO2017211206 A1 WO 2017211206A1 CN 2017086325 W CN2017086325 W CN 2017086325W WO 2017211206 A1 WO2017211206 A1 WO 2017211206A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
video
audio
marking
video file
Prior art date
Application number
PCT/CN2017/086325
Other languages
French (fr)
Chinese (zh)
Inventor
韦薇
王启贵
谢思远
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017211206A1 publication Critical patent/WO2017211206A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • This application relates to, but is not limited to, the field of communication technology.
  • the process of tagging a video file requires manual viewing and a determination as to whether or not to mark it, and where to mark the video file.
  • This type of marking is not only inefficient, but also whether the judgment result of the marking and the determination of the marking position are subject to manual influence, which may result in poor marking accuracy.
  • the present invention provides a video tagging method, device, and video monitoring method and system, which solve the problem of low efficiency and poor accuracy when the video tagging is manually implemented in the related art.
  • a video marking method including:
  • the extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
  • an event flag is generated for the audio event that occurs at a corresponding location in the video file.
  • the extracting the sound characteristics of the audio signal in the video file includes:
  • the sound characteristics of the audio signal in the video file are extracted during video recording.
  • the extracting the sound characteristics of the audio signal includes:
  • the matching the extracted sound features with each of the audio events comprises:
  • the corresponding location in the video file performs event marking on the audio event that occurs, including performing one or more of the following:
  • the acquiring the severity level corresponding to the audio event that occurs includes:
  • the severity level corresponding to the audio event is determined according to one or more of recording location information of the video file and duration after the audio event occurs.
  • the event marking the occurrence of the audio event in the corresponding position in the video file includes:
  • the marking format corresponding to the severity level of the audio event is selected according to the correspondence table of the severity level and the marking format.
  • the extracting the sound feature of the audio signal in the video file includes: extracting a sound feature of the audio signal in the video file according to a preset detection period;
  • the method further includes:
  • the start time of the audio event in the previous detection period is taken as the start time of the audio event in the current detection period
  • the end time of the audio event in the previous detection period and the start time of the audio event in the current detection period are set as the start time of the current detection period.
  • the embodiment of the invention further provides a video monitoring method, including:
  • the video file of the event tag portion is displayed as an alarm.
  • the embodiment of the invention further provides a video marking device, comprising:
  • a feature extraction module configured to: extract a sound feature of the audio signal in the video file
  • a processing module configured to: match the sound feature extracted by the feature extraction module with each audio event in the audio event library; each audio event is established based on a sound feature of an audio signal generated when the event occurs;
  • a marking module configured to: when the processing result of the processing module is that the sound feature is successfully matched with the at least one of the audio events, the corresponding event in the video file performs an event tag on the audio event that occurs.
  • the device further includes:
  • the video recording module is set to: perform video recording
  • the feature extraction module is configured to: extract a sound feature of an audio signal in the video file during video recording by the video recording module.
  • the marking module performs event marking on the occurrence of the audio event in a corresponding position in the video file, including performing one or more of the following markings:
  • the embodiment of the invention further provides a video monitoring system, comprising: a monitoring processing device and the video marking device according to any one of the preceding claims;
  • the video tagging device is configured to: perform an event tag on the video file recorded during the video monitoring process, and notify the monitoring processing device after completing an event tag on the video file;
  • the monitoring processing device is configured to: after receiving the alarm of the video marking device, perform an alarm display on the video file of the event marking portion.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the processor executes the computer executable instructions, the following operations are performed:
  • the extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
  • an event flag is generated for the audio event that occurs at a corresponding location in the video file.
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the processor executes the computer executable instructions, the following operations are performed:
  • the video file of the event tag portion is displayed as an alarm.
  • the video marking method, device and video monitoring method and system provided by the embodiments of the present invention, by extracting the sound characteristics of the audio signal in the video file, matching the extracted sound features with each audio event in the audio event library, when the extracted sound When the feature is successfully matched with the at least one audio event, it indicates that the audio event occurs in the video file, and the audio event occurs in the corresponding position in the video file; wherein the audio event is based on the audio generated when the event occurs in advance
  • the sound characteristics of the signal are established.
  • by setting an audio event in advance, and then matching the sound feature of the audio signal in the video file with each audio event to determine whether the corresponding mark needs to be performed it is not necessary to manually view the video content to determine whether to mark.
  • the efficiency and accuracy of marking video files can be greatly improved.
  • FIG. 1 is a flowchart of a video marking method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a video monitoring method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a video marking apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another video marking apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of still another video marking apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a video monitoring system according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a component of a video monitoring system according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of still another video monitoring method according to an embodiment of the present invention.
  • the audio event is set in advance, and then the sound feature of the audio signal in the video file to be processed is extracted, and the sound feature is matched with each audio event to automatically determine whether the corresponding mark needs to be performed, and does not need to be manually viewed.
  • the video content thus determines whether or not to mark, which can greatly improve the efficiency and accuracy of marking video files.
  • FIG. 1 it is a flowchart of a video marking method according to an embodiment of the present invention.
  • the video marking method provided by the embodiment of the present invention may include the following steps, namely, S101 to S105:
  • the video file in the embodiment of the present invention may include an audio signal and a synchronously recorded video signal.
  • the S101 in the embodiment of the present invention may be performed after the video file is recorded, or during the video file recording process, and the time for marking the video file may be improved during the video file recording process. Real-time marking can be realized. For some areas, especially in the field of video surveillance, the marked alarm content is seen one minute or one second earlier, and the impact on the subsequent alarm event situation may be very different. Therefore, the implementation of S101 during video file recording is of great significance for the field of video surveillance.
  • Each audio event in an embodiment of the invention is established based on the acoustic characteristics of the audio signal produced at the time the event occurred. For example, for a smoke alarm event, a smoke alarm sound is generated, and the sound characteristics of the sound are extracted to obtain an audio event of the smoke alarm. For another example, for a robbery or violent or aggressive incident, there may be a call for help, such as screaming for help, and extracting the sound characteristics of these sounds may result in an audio event of a robbery or violent or aggressive event. In addition to the above examples, there are generally one or more sound features corresponding to the event for different events, for example, a gunshot event may correspond to a gunshot sound, and in other events, a corresponding glass breakage may occur. Sound, crying, car horn, etc. This embodiment of the present invention will not be described again. Those skilled in the art should understand that the audio events in the embodiments of the present invention can be flexibly configured according to the requirements in practical applications.
  • S103 determining whether the extracted sound feature is successfully matched with at least one audio event; When the matching is performed, S104 is performed; when it is determined that the matching is not performed, S105 is performed;
  • S104 Perform event tagging on the audio event that occurs in the corresponding position in the video file.
  • S105 The current no audio event occurs, and can continue to wait for the next detection.
  • the embodiment of the present invention completes matching with each audio event by presetting the audio event and automatically extracting the sound features of the sound signal in the video file through the flow shown in FIG. 1, if the extracted sound feature matches one of the audio events. If successful, the audio event occurs in the video file, and the audio event is automatically marked in the corresponding position of the video file.
  • the above judgment and marking process do not require manual participation at all, which can improve efficiency and improve the accuracy of marking.
  • the foregoing process may also be performed in a video recording process, and in the video recording process, real-time marking processing may be performed on the recorded video file, and the related file needs to be recorded in the video file.
  • the time limit of the mark processing can be greatly improved. For the video surveillance field, some malicious events can be stopped in time or even some malicious events can be avoided to ensure the security of the user's property and life.
  • the implementation of extracting the sound feature of the audio signal and matching the extracted sound feature with each audio event may be performed by using the following process:
  • the background signal and the foreground signal of the audio signal are extracted.
  • the background signal and the foreground signal can be separated by behavioral modeling based on the neural mechanism of human hearing; this can eliminate the influence of the background signal.
  • performing event marking on the audio event that occurs in the corresponding position in the video file includes performing one or more of the following:
  • the start time of the occurrence of the audio event is marked at the key video frame position of the video file; the key video frame in the embodiment of the present invention refers to the video frame at the moment when the audio event occurs.
  • the event mark can be performed after obtaining a clear sound source direction (which can also be characterized by an angle) and/or a distance to avoid unclear sound.
  • the source information instead leads to misleading subsequent processing.
  • a corresponding severity level may be set for each audio event.
  • the severity level may be set to be a general Severe, characterized by a coefficient of 0; for a vicious audio event, set its severity level to be more severe, characterized by a factor of 1; for a dangerous audio event, set its severity level to be very severe, characterized by a factor of 2.
  • you perform a severity level tag you can directly mark the coefficient for each severity level.
  • its severity level may not only be related to the event itself, but may also be closely related to where the event occurred (eg, hotels, shops, schools, residential areas, within the home) and the duration of the event.
  • the implementation manner of obtaining the severity level corresponding to the audio event may include: according to the location information of the recorded video file (that is, the location where the audio event occurs) and/or the audio event occurs.
  • the duration of the subsequent determination of the severity level corresponding to the audio event that is, determining the severity level corresponding to the audio event in combination with the acquired location information and/or duration, such that the determined severity level considers a more comprehensive factor.
  • the results obtained are also more accurate.
  • the implementation of the event marking of the audio event in the corresponding position in the video file may include: marking the severity level corresponding to the audio event
  • the tag format corresponding to the severity level of the audio event may be selected according to a preset correspondence table between the severity level and the tag format.
  • the mark format in the embodiment of the present invention includes, but is not limited to, a different color/format adopted by the text box and the text. The following is an example. See Table 1.
  • the audio events of different severity levels are marked with different mark formats, and when the video files of the mark parts are displayed, the mark can be displayed in different formats, for the user Play different prompts, which is more conducive to users to respond correctly and quickly to events of different severity levels.
  • the process of marking the video file may be performed periodically.
  • the embodiment of the present invention may preset a detection period, for example, 10 seconds, that is, every 10 seconds. It can be detected once; it can also be set to 30 seconds, 1 minute, etc.
  • the value of the detection period can be flexibly set according to actual needs.
  • the audio signal and its sound characteristics are extracted from the recorded video file during each detection cycle, and the extracted sound features are matched with each audio event. In this way, there will be the following situations: for example, suppose a robbery occurs, in which the threat of intimidation, crying, crying, and crying for several seconds, and no obvious long interruption in the middle, may continue to occur for a period of time.
  • the embodiment of the present invention can set an event merging rule, which can be integrated into an audio event for the above similar situation, and improves the intelligence and accuracy of detection and marking.
  • the merge rule can be set to any of the following rules:
  • M is greater than or equal to 2; for example, if the detection period is 1 minute and M is 10, the same audio event is allowed to be merged within 10 minutes;
  • the combination is detected as an associated audio event, and N is greater than or equal to 2.
  • the start time of the event is marked in the video file, and the end time may not be marked first.
  • wait for the next detection cycle if the next detection cycle If no audio event is detected, or no identical or associated audio event is detected, the end time of the audio event is marked as the start time of the next detection cycle.
  • the sound features of the audio signals extracted in the adjacent detection period are successfully matched with the at least one audio event, that is, the audio events occur in two adjacent detection periods.
  • the start time of the audio event in the previous detection period is used as the current detection.
  • the start time of the audio event in the period, the end time is not marked first, waiting for the subsequent detection result; when it is judged that the combination is not performed, the end time of the audio event in the previous detection period is set as the start time of the current detection period, and the current detection is set.
  • the start time of the audio event in the cycle is the start time of the current detection cycle.
  • the method provided by the embodiment of the present invention combines the generated audio events.
  • the re-acquisition of the severity level corresponding to the audio event may be re-acquired, and when the change occurs, the corresponding update may be performed.
  • the marking of the video file may also include the end time and/or duration of the audio event.
  • the video file for the start time and end time segments can be referred to as a tag video. In the subsequent alarm display, the video of this label can be displayed in a targeted manner.
  • FIG. 2 is a flowchart of a video monitoring method according to an embodiment of the present invention.
  • the video monitoring method provided by the embodiment of the present invention may include the following steps, that is, S201 to S203:
  • S201 Perform monitoring video recording; in practical applications, video acquisition and synchronized audio collection can be performed by an image collector (such as a camera) and a pickup.
  • an image collector such as a camera
  • the monitoring personnel can check the most timely I saw the video content of the audio event part and made the corresponding processing in time.
  • the event may still occur, has not ended yet; or the event may have ended, depending on the event duration and the event detection period.
  • the implementation of the alarm display of the video file of the marked part may send an alarm to the background server.
  • the real-time video of the image collector in the embodiment of the present invention is displayed before the display device, the video content of the marked portion is currently displayed by the display device, and the corresponding event flag is displayed correspondingly.
  • the video link of the alarm message and the event tag portion may be sent to the display device, and the user may play the video by clicking the video link, and
  • the function of switching to real-time video at any time can also be provided in the embodiment of the present invention.
  • corresponding alarm processing may be required for audio events that occur (eg, robbery, shooting, etc.). Therefore, when displaying on the display device, the embodiment of the present invention can also provide an alarm option bar, which can be integrated with the time point mark on the video progress bar, and the alarm option can be popped up when the user clicks. At the same time, it is considered that the user needs to view multiple times for the key event to make the determination. Therefore, the embodiment of the present invention can also provide a lookback function and can also be integrated in a certain position on the video progress bar (for example, the corresponding mark can be embodied, for example, integrated. On the time point mark), the user needs to look back and click on the logo of the corresponding location.
  • the audio event occurring can be timely and accurately observed in the monitoring process, and a timely and accurate response can be made to ensure the security of the user's property and life.
  • FIG. 3 is a schematic structural diagram of a video marking apparatus according to an embodiment of the present invention.
  • the video marking device 30 provided by the embodiment of the present invention may include: a feature extraction module 31, a processing module 32, and a marking module 33.
  • the feature extraction module 31 is configured to: extract sound features of the audio signal in the video file.
  • the video file in the embodiment of the present invention may include an audio signal and a synchronously recorded video signal.
  • the processing module 32 is configured to: the sound feature and the audio event library extracted by the feature extraction module 31 Each audio event is matched.
  • Each audio event in an embodiment of the invention is established based on the acoustic characteristics of the audio signal produced at the time the event occurred. For different events, there are generally one or more sound features corresponding to the event. For example, a gunshot event may have a gunshot sound, while in other events, a glass break sound may be generated correspondingly, crying. Sound, car horn, and so on. This embodiment of the present invention will not be described again. Those skilled in the art should understand that the audio events in the embodiments of the present invention can be flexibly configured according to the requirements in practical applications.
  • the marking module 33 is configured to: when the processing result of the processing module 32 is that the sound feature is successfully matched with the at least one audio event, the corresponding event in the video file is event-marked for the audio event that occurs.
  • the foregoing functions of the feature extraction module 31, the processing module 32, and the marking module 33 in the embodiment of the present invention may be implemented by a processor, or may be implemented independently.
  • the feature extraction module 31 automatically extracts the sound features of the sound signal in the video file to complete the matching with each audio event via the processing module 32.
  • Event markers are automatically made at the appropriate location in the video file. The entire process does not require manual participation, and the marking efficiency and accuracy can be greatly guaranteed.
  • FIG. 4 is a schematic structural diagram of another video marking apparatus according to an embodiment of the present invention. Based on the structure of the device shown in FIG. 3, the video tagging device 30 in the embodiment of the present invention may further include:
  • the video recording module 34 is configured to: perform video recording.
  • the video recording module 34 can include a video capture device and a pickup. That is, the video tagging device 30 itself can be used as a monitoring device, which can cooperate with the monitoring platform to complete video monitoring in various scenarios.
  • the feature extraction module 31 is configured to: during the video recording process performed by the video recording module 34, extract the sound features of the audio signal in the video file, and then complete the subsequent marking process flow via the processing module 32 and the marking module. This can improve the timeliness of the processing of marking the video file, and basically realize real-time marking. For the video monitoring field, the marked alarm content is seen one minute or one second earlier, and the impact on the subsequent alarm event situation may be very Different.
  • the feature extraction module 31 extracts a sound feature of the audio signal.
  • the processing module 32 matches the extracted sound features with each audio event, which can be performed by the following process:
  • the feature extraction module 31 After transforming the audio signal into the time-frequency domain, the feature extraction module 31 extracts the background signal and the foreground signal of the audio signal, and extracts the sound feature set from the foreground signal.
  • the feature extraction module 31 can separate the background signal and the foreground signal by behavioral modeling based on the neural mechanism of human hearing; this can eliminate the influence of the background signal.
  • the processing module 32 reads the audio event from the audio event library, and calculates the similarity between the sound feature set extracted by the feature extraction module 31 and each audio event, when the similarity between the sound feature set and an audio event is greater than the set similarity. When the threshold is reached, it is determined that the audio event is successfully matched, that is, the audio event is determined to have occurred in the video file. Event events can then be flagged for the audio event that occurs in the video file.
  • the marking module 33 performs event marking on the generated audio event in the corresponding position in the video file, including performing one or more of the following markings:
  • the start time of the occurrence of the audio event is marked at the key video frame position of the video file; the key video frame in the embodiment of the present invention refers to the video frame at the moment when the audio event occurs.
  • a corresponding severity level may be set for each audio event.
  • the severity level may be set to be a general Severe, characterized by a coefficient of 0; for a vicious audio event, set its severity level to be more severe, characterized by a factor of 1; for a dangerous audio event, set its severity level to be severe, characterized by a factor of 2.
  • you perform a severity level tag you can directly mark the coefficient for each severity level.
  • its severity level may not only be related to the event itself, but may also be related to the event. Locations that occur (such as hotels, shops, schools, residential areas, homes) and the duration of events are closely related.
  • the implementation manner of obtaining the severity level corresponding to the audio event may include: according to the location information of the recorded video file (that is, the location where the audio event occurs) and/or the audio event occurs.
  • the duration of the subsequent determination of the severity level corresponding to the audio event that is, determining the severity level corresponding to the audio event in combination with the acquired location information and/or duration, such that the determined severity level considers a more comprehensive factor.
  • the results obtained are also more accurate.
  • the implementation of the event marking of the audio event in the corresponding position in the video file may include: marking the severity level corresponding to the audio event
  • the tag format corresponding to the severity level of the audio event may be selected according to a preset correspondence table between the severity level and the tag format.
  • the mark format in the embodiment of the present invention includes, but is not limited to, a different color/format adopted by the text box and the text.
  • FIG. 5 is a schematic structural diagram of still another video marking apparatus according to an embodiment of the present invention.
  • the video tagging apparatus 30 provided by the embodiment of the present invention may further include:
  • the cache module 35 is configured to: store an audio event library, which may be obtained from another server (for example, a monitoring server), or directly receive the user's setting acquisition.
  • the cache module 35 is further configured to: cache audio data and video data collected by the video recording module 34, and various data marked by the cache tag module 33.
  • the process of marking the video file may be performed periodically.
  • the embodiment of the present invention may preset a detection period, for example, 10 seconds, that is, every 10 seconds. It can be detected once; it can also be set to 30 seconds, 1 minute, etc. In practical applications, the value of the detection period can be flexibly set according to actual needs.
  • the feature extraction module 31 extracts the audio signal and its sound features from the recorded video file, and the processing module 32 matches the extracted sound features with each audio event. In this way, there will be matching to the same or associated audio events over multiple detection cycles.
  • the embodiment of the present invention can set an event merging rule, and the similarity flag module 33 can integrate it into an audio event to improve the intelligence and accuracy of detection and marking.
  • the tagging module 33 can be based on the following merge rules Process any of them:
  • M is greater than or equal to 2; for example, if the detection period is 2 minutes and M is 5, the same audio event is allowed to be merged within 10 minutes;
  • the combination is detected as an associated audio event, and N is greater than or equal to 2.
  • the marking module 33 when the processing module 32 detects that an audio event occurs for the first time in a certain detection period, the marking module 33 first marks the start time of the event in the video file. The end time may not be marked first, waiting for the detection result of the next detection period. If no audio event is detected in the next detection period, or no identical or associated audio event is detected, the end time of marking the audio event is The start time of the next detection cycle.
  • the processing module 32 has a sound feature of the audio signal extracted in the adjacent detection period that matches at least one audio event, that is, in two adjacent detection periods.
  • the marking module 33 may determine whether to merge the audio events occurring in two adjacent detection periods according to the above-mentioned preset event merging rule; when it is determined that the merging is performed, the audio in the previous detection period is The start time of the event is used as the start time of the audio event in the current detection period. The end time is not marked first, waiting for the subsequent detection result. When it is judged that the combination is not performed, the end time of the audio event in the previous detection period is set as the current detection period. The start time of the audio event in the current detection period is set to the start time of the current detection period.
  • the marking module 33 merges the generated audio events. , can re-acquire whether the severity level corresponding to the audio event changes, and when the change occurs, the mark can be updated accordingly. Therefore, in the embodiment of the present invention, the marking performed by the marking module 33 on the video file may further include an end time and/or duration of the audio event.
  • the video file for the start time and end time can be called label view. frequency.
  • FIG. 6 is a schematic structural diagram of a video monitoring system according to an embodiment of the present invention.
  • the video monitoring system 60 provided by the embodiment of the present invention may include: a monitoring processing device 61 and a video marking device 62 in any of the embodiments shown in FIGS. 3 to 5.
  • the video tagging device 62 is configured to: perform event tagging on the video file recorded during the video monitoring process, and after performing an event tag on the video file, alert the monitoring processing device 61; the process of the alarm is also an event.
  • the activation process is marked, which can be done by a tag activation module set in video tagging device 62.
  • the monitoring processing device 61 is configured to: after receiving the alarm of the video marking device 62, display the video file of the event marking portion. It should be noted that, according to the above description, the event may still occur, and has not yet ended; or the event may have ended, depending on factors such as the duration of the event, the detection period, and the like.
  • the monitoring processing device 61 may be implemented by using a background server in combination with a corresponding display device, the background server storing the storage medium for storing the audio event library, and also for storing the information from the video marking device 62. Video data, alarm information, etc.
  • the video tagging device 62 may further include an interaction unit, where the interaction unit may be a display unit, configured to: receive a tag video and real-time video that can be viewed from the tag module 33, or receive or deliver Various interactive messages.
  • the interaction unit may be a display unit, configured to: receive a tag video and real-time video that can be viewed from the tag module 33, or receive or deliver Various interactive messages.
  • the video marking device 62 sends the video file of the event tag portion to the monitoring processing device 61 for alarm. If the real-time video of the image collector in the embodiment of the present invention is displayed before the monitoring processing device 61, the monitoring device 61 currently displays the video content of the marked portion, and the corresponding event flag is displayed accordingly. If the monitoring processing device 61 is not displaying the real-time video of the image collector in the embodiment of the present invention, the video link of the alarm message and the event tag portion may be sent to the monitoring processing device 61, and the user may click the link to play the video. The function of switching to real-time video at any time can also be provided in the embodiment of the present invention.
  • corresponding alarm processing may be required for audio events that occur (eg, robbery, shooting, etc.). Therefore, the embodiment of the present invention is in monitoring the processing equipment.
  • an alarm option bar can also be provided.
  • the alarm option bar can be integrated with the time point mark on the video progress bar, and when the user clicks, the alarm option can be popped up.
  • the embodiment of the invention can also provide a lookback function and can also be integrated in a certain position on the video progress bar, and the user needs to look back at the identifier of the corresponding location. Just fine.
  • the monitoring processing device 61 and the video marking device 62 can be combined to form a monitoring system, and the video marking device 62 can realize real-time marking function on the video, which can be timely and accurate in the monitoring process. View the audio events that occur and make timely and accurate responses to ensure the safety of the user's property and life.
  • FIG. 7 is a schematic structural diagram of a video surveillance system according to an embodiment of the present invention.
  • the component of the video surveillance system may include:
  • Camera and pickup module 71 also referred to as monitoring module or monitoring device
  • audio event object 72 may be a display or separate Display terminals, such as mobile phones, pads, etc.
  • background server 73 may be a display or separate Display terminals, such as mobile phones, pads, etc.
  • the camera and the pickup module 71 may be a camera with a built-in pickup or a camera with an external pickup. If it is external, audio and video synchronization is required.
  • the camera and the pickup module 71 further includes a feature extraction module, a processing module, a marking module, and a cache module.
  • the feature extraction module and the processing module are configured to: detect an audio event object 72 according to the collected audio signal; and set the marking module to: According to the detected audio event object 72, the mark attribute of the time point mark of the video event mark is acquired, and the real-time video is edited, the time point mark is marked, and the eye-catching text box annotation is added on the label video frame; the cache module is set to: Cache audio event libraries, acquired audio and video signals, event markers, and more.
  • the feature extraction module is configured to: first separate the foreground signal and the background signal of the audio signal, and perform feature extraction on the foreground signal, and the processing module is further configured to: compare the foreground signal with an audio event of the event detection model library in the cache module, if similar If the degree exceeds the set threshold, then one or more types of audio events are detected. Mark The module can locate the sound source and obtain the sound source distance and sound source direction. Then determine the severity. The processing module is further configured to: first determine whether to integrate the audio event, and if so, integrate the audio event, obtain the start time, the end time, and the duration, and integrate the audio detection conclusion, the sound source angle, and the sound source distance, and re-determine the severity level.
  • An audio event within an integrated time period generates only one point-in-time marker, including the marker start time and the marker end time. Saved to the cache module and synchronized to the database of the background server 73.
  • the camera and pickup module 71 is connected to the background server 73 via a network. Send an alarm to the background server. If the real-time video of the camera is being displayed before, continue to display. At this time, the label video with the marked attribute should be displayed; if the real-time video of the camera is not displayed before, the background server is displayed.
  • Send alarm messages and video links click to display the tag video with tag attributes, you can switch to live video at any time. The time point mark appears on the video progress bar. Click the mark to select the alarm or roll back. If you select the alarm, dial the specified alarm call and share the tag video marked with time and location.
  • management station 74 which may include an audio event management module, an audio event severity level determination management module, and a merge rule management module
  • audio features of specific events, entry and management of severity level determination rules, and merge rules, etc. can be entered and managed. .
  • the monitoring device (including the camera and the pickup) is disposed in the elevator. After the monitoring device detects the data collected by the pickup through the audio event in the current detection period, the detected audio event is “in-the-bail robbery”.
  • the audio event E1 is pre-registered and the mark attribute of the corresponding time point mark S1 is recorded, including the mark start time (ie, the current time), the severity level, the sound source distance, and the sound source direction. For example, the mark start time: 19:50:00, mark attribute: robbery
  • the audio event E1 is officially registered, and the video in the time period from the mark start time to the mark end time is called a tag video, and the tag video can be appropriately time-slided, for example, the start time of the tag video is pushed forward n Seconds and the end time is pushed back n seconds to get a more complete picture of the event.
  • the start time of the tag video is 19 seconds before 19:50:00. If n is 5, the start time is 19:49:55, and the end time is empty, indicating that the audio event is still occurring and does not end.
  • the next detection period starts at 19:50:11, and the audio and video acquisition and audio event detection are still performed according to the previous steps.
  • Detection is "robbery.”
  • the audio event is pre-registered as E2, and the current time, severity level, sound source distance, and sound source direction are recorded. For example, the mark start time: 19:50:11, mark attribute: robbery
  • Event consolidation is performed on E2 and E1 according to the event integration decision rule.
  • the flag of the event flag S1 of the audio event E1 is updated, for example, the duration, and the severity level is re-determined according to the severity level determination rule.
  • the start time of the next detection cycle is 19:51:10.
  • the audio and video acquisition and audio event detection are still performed according to the previous steps. After the data collected by the pickup of the camera is detected by the audio event, no event is detected. And since the mark end time of the time point mark S1 of the last audio event E1 is empty, the mark end time of setting S1 is 19:51:09.
  • the normal video is played on the corresponding display screen of the camera. There is no text comment in the center of the video, and the progress bar is displayed. At 19:50:00, there is an annotation in orange font for "robbery: 1 minute. The 09-second time point mark is no longer highlighted.
  • FIG. 8 is a flowchart of still another video monitoring method according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention may include the following steps, that is, S801 to S820:
  • an event detection period starts, and at time T1, an audio event is detected according to the above-mentioned video marking method (that is, matching of an audio event is performed);
  • S802 determining whether an audio event is detected; when it is determined that an audio event is not detected, executing S803; when it is determined that an audio event is detected, executing S805;
  • S803 determining whether the marking end time of the last audio event E0 is empty; when it is determined that it is empty, executing S804; when it is determined that it is not empty, executing S801 (waiting for the arrival of the next event detecting period);
  • S804 set the mark end time of the last audio event E0 to the previous second of T1, re-determine the severity level of E0, update the event flag S0 of the audio event E0, activate the flag S0; then execute S811;
  • S806 determining whether the audio event E1 is integrated with the last audio event E0; if integrated, executing S807; if not, executing S808;
  • S808 determining whether the end time of the marking of the audio event E0 is empty; when it is judged to be empty, executing S809; when it is determined that it is not empty, executing S810;
  • S810 Formally register the audio event E1, the event marker S1 starts at time T1, the end time is empty, the activation flag E1; and then executes S811;
  • S811 determining whether the video of the camera is being played; when it is determined that the video is being played, executing S814; when it is determined that the video is not playing, executing S812;
  • the display manner may be various, for example, displaying in the right area of the screen and sorting according to the event start time from the time of going to the post, if the monitor has not viewed the audio all the time.
  • the tag video link of the event may have multiple event alarm messages after a period of time. For the same audio event that updates the tag attribute multiple times, the alarm message needs to be merged);
  • S814 playing a tag video, simultaneously displaying an event tag attribute and a start time of the corresponding audio event on the screen, and displaying a progress bar, and displaying an audio event flag on a start time of the time point mark corresponding to the audio event on the progress bar;
  • S816 judging whether to click "alarm"; if clicked, executing S817; if not clicking, executing S818;
  • S817 Alert the designated terminal by phone or SMS or other specified means, share the tag video link of the audio event; then end the process.
  • the video marking method and the video monitoring method provided by the embodiments of the present invention can quickly locate the moment when a specific behavior or a specific event occurs in the video in the video monitoring, so that the video monitoring personnel can quickly find the problem and improve the working efficiency of the video monitoring personnel. .
  • Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that, when executing computer executable instructions, perform the following operations, namely, S11 to S13:
  • Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that, when executing computer executable instructions, perform the following operations, namely S21 to S23:
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
  • the device/function module/functional unit in the above embodiment can be implemented by using a general-purpose computing device. Now, they can be concentrated on a single computing device or distributed over a network of multiple computing devices.
  • the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the extracted sound feature is matched with each audio event in the audio event library by extracting the sound feature of the audio signal in the video file, and the video file is indicated when the extracted sound feature is successfully matched with the at least one audio event.
  • the audio event occurs, and the audio event occurring in the video file is event-marked; wherein the audio event is established in advance based on the sound characteristics of the audio signal generated when the event occurs.
  • by setting an audio event in advance, and then matching the sound feature of the audio signal in the video file with each audio event to determine whether the corresponding mark needs to be performed it is not necessary to manually view the video content to determine whether to mark.
  • the efficiency and accuracy of marking video files can be greatly improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Television Signal Processing For Recording (AREA)
  • Alarm Systems (AREA)

Abstract

Provided are a video marking method and device, and a video monitoring method and system. The video marking method comprises: extracting, from a video file, a sound feature of an audio signal; performing matching on the basis of the extracted sound feature and each audio event in an audio event library; and if the extracted sound feature matches at least one audio event, adding an event marker at a corresponding position in the video file to signify occurrence of an audio event.

Description

视频标记方法、装置及视频监控方法和系统Video marking method, device and video monitoring method and system 技术领域Technical field
本申请涉及但不限于通信技术领域。This application relates to, but is not limited to, the field of communication technology.
背景技术Background technique
相关技术在监控等领域的视频录制中,对于录制的视频文件,都是采用事后手动进行标记的方式。一般过程为:先进行视频录制得到视频文件,然后在视频编辑器中打开一段已录制好的视频,通过人工查看视频找到需要打时间点标记的时刻,在标记栏时间轴上的相应时刻,添加标记指示器,然后添加文字标签。这种标记视频的方式存在以下问题:Related technologies In video recording in areas such as monitoring, the recorded video files are manually marked afterwards. The general process is: first video recording to get a video file, then open a recorded video in the video editor, manually view the video to find the time to mark the time point, add the corresponding time on the timeline of the marker bar, add Mark the indicator and add a text label. This way of tagging video has the following problems:
对视频文件进行标记的过程需要人工查看,并作出是否需要进行标记的判断,以及确定在视频文件的哪个位置进行标记。这种标记方式不仅效率低,而且是否需要进行标记的判断结果和确定标记位置都会受人工影响,可能会导致标记的准确性差。The process of tagging a video file requires manual viewing and a determination as to whether or not to mark it, and where to mark the video file. This type of marking is not only inefficient, but also whether the judgment result of the marking and the determination of the marking position are subject to manual influence, which may result in poor marking accuracy.
发明概述Summary of invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本文提供一种视频标记方法、装置及视频监控方法和系统,以解决相关技术中对视频标记时采用人工实现,而导致效率低、准确性差的问题。The present invention provides a video tagging method, device, and video monitoring method and system, which solve the problem of low efficiency and poor accuracy when the video tagging is manually implemented in the related art.
一种视频标记方法,包括:A video marking method, including:
提取视频文件中音频信号的声音特征;Extracting sound characteristics of the audio signal in the video file;
将提取的所述声音特征与音频事件库中每个音频事件进行匹配;所述每个音频事件基于事件发生时产生的音频信号的声音特征而建立;The extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
当所述声音特征与至少一个所述音频事件匹配成功时,在所述视频文件中对应位置对发生的所述音频事件进行事件标记。 When the sound feature is successfully matched with at least one of the audio events, an event flag is generated for the audio event that occurs at a corresponding location in the video file.
可选地,所述提取视频文件中音频信号的声音特征,包括:Optionally, the extracting the sound characteristics of the audio signal in the video file includes:
在视频录制过程中提取所述视频文件中音频信号的声音特征。The sound characteristics of the audio signal in the video file are extracted during video recording.
可选地,所述提取所述音频信号的声音特征,包括:Optionally, the extracting the sound characteristics of the audio signal includes:
将所述音频信号变换到时频域,并提取所述音频信号的前景信号;Converting the audio signal to a time-frequency domain and extracting a foreground signal of the audio signal;
所述将提取的所述声音特征与所述每个音频事件进行匹配,包括:The matching the extracted sound features with each of the audio events comprises:
从所述前景信号中提取声音特征集合,计算所述声音特征集合与所述每个音频事件的相似度,得到的相似度大于设定的相似度阈值时,匹配成功。And extracting a sound feature set from the foreground signal, and calculating a similarity between the sound feature set and each of the audio events, and the obtained similarity is greater than a set similarity threshold, and the matching is successful.
可选地,所述在所述视频文件中对应位置对发生的所述音频事件进行事件标记,包括进行以下标记中的一种或多种:Optionally, the corresponding location in the video file performs event marking on the audio event that occurs, including performing one or more of the following:
在所述视频文件的关键视频帧位置标记所述音频事件发生的开始时间;Marking a start time of occurrence of the audio event at a key video frame position of the video file;
获取并标记发生的所述音频事件中声源相对拾音器的方向信息和距离信息中的一项或多项;Acquiring and marking one or more of direction information and distance information of the sound source relative to the pickup in the audio event that occurs;
获取并标记发生的所述音频事件对应的严重度级别;Obtaining and marking the severity level corresponding to the audio event that occurred;
获取并标记发生的所述音频事件的名称。Gets and marks the name of the audio event that occurred.
可选地,所述获取发生的所述音频事件对应的严重度级别,包括:Optionally, the acquiring the severity level corresponding to the audio event that occurs includes:
根据录制所述视频文件的位置信息和所述音频事件发生后的持续时间中的一项或多项,确定所述音频事件对应的严重度级别。The severity level corresponding to the audio event is determined according to one or more of recording location information of the video file and duration after the audio event occurs.
可选地,所述在所述视频文件中对应位置对发生的所述音频事件进行事件标记,包括:Optionally, the event marking the occurrence of the audio event in the corresponding position in the video file includes:
对所述音频事件对应的严重度级别进行标记时,还根据所述严重度级别与标记格式的对应关系表,选择与所述音频事件的严重度级别对应的标记格式进行标记。When the severity level corresponding to the audio event is marked, the marking format corresponding to the severity level of the audio event is selected according to the correspondence table of the severity level and the marking format.
可选地,所述提取视频文件中音频信号的声音特征,包括:根据预设的检测周期提取所述视频文件中音频信号的声音特征;Optionally, the extracting the sound feature of the audio signal in the video file includes: extracting a sound feature of the audio signal in the video file according to a preset detection period;
所述方法还包括:The method further includes:
在相邻检测周期所提取的音频信号的声音特征都与至少一个所述音频事件匹配成功时,根据预设的事件合并规则判断是否对所述相邻两个检测周 期内发生的音频事件进行合并;When the sound features of the audio signals extracted in the adjacent detection period are successfully matched with the at least one of the audio events, determining whether the adjacent two detection weeks are performed according to a preset event combining rule The audio events occurring during the period are merged;
当判断出进行合并时,将前一个检测周期内的音频事件的开始时间作为当前检测周期内音频事件的开始时间;When it is determined that the merging is performed, the start time of the audio event in the previous detection period is taken as the start time of the audio event in the current detection period;
当判断出不进行合并时,设置所述前一个检测周期内的音频事件的结束时间和所述当前检测周期内的音频事件的开始时间为所述当前检测周期的开始时间。When it is determined that the merging is not performed, the end time of the audio event in the previous detection period and the start time of the audio event in the current detection period are set as the start time of the current detection period.
本发明实施例还提供一种视频监控方法,包括:The embodiment of the invention further provides a video monitoring method, including:
进行监控视频录制;Perform surveillance video recording;
在所述视频录制的过程中,通过如上任一项所述的视频标记方法对录制得到的视频文件进行事件标记;In the process of recording the video, performing event marking on the recorded video file by using the video marking method according to any one of the above;
对所述视频文件完成一个事件标记后,将事件标记部分的视频文件进行告警显示。After an event tag is completed on the video file, the video file of the event tag portion is displayed as an alarm.
本发明实施例还提供一种视频标记装置,包括:The embodiment of the invention further provides a video marking device, comprising:
特征提取模块,设置为:提取视频文件中音频信号的声音特征;a feature extraction module, configured to: extract a sound feature of the audio signal in the video file;
处理模块,设置为:将所述特征提取模块提取的所述声音特征与音频事件库中每个音频事件进行匹配;所述每个音频事件基于事件发生时产生的音频信号的声音特征而建立;a processing module, configured to: match the sound feature extracted by the feature extraction module with each audio event in the audio event library; each audio event is established based on a sound feature of an audio signal generated when the event occurs;
标记模块,设置为:当所述处理模块的处理结果为所述声音特征与至少一个所述音频事件匹配成功时,在所述视频文件中对应位置对发生的所述音频事件进行事件标记。And a marking module, configured to: when the processing result of the processing module is that the sound feature is successfully matched with the at least one of the audio events, the corresponding event in the video file performs an event tag on the audio event that occurs.
可选地,所述装置还包括:Optionally, the device further includes:
视频录制模块,设置为:进行视频录制;The video recording module is set to: perform video recording;
所述特征提取模块,设置为:在所述视频录制模块进行视频录制过程中,提取所述视频文件中音频信号的声音特征。The feature extraction module is configured to: extract a sound feature of an audio signal in the video file during video recording by the video recording module.
可选地,所述标记模块在所述视频文件中对应位置对发生的所述音频事件进行事件标记,包括进行以下标记中的一种或多种:Optionally, the marking module performs event marking on the occurrence of the audio event in a corresponding position in the video file, including performing one or more of the following markings:
在所述视频文件的关键视频帧位置标记所述音频事件发生的开始时间; Marking a start time of occurrence of the audio event at a key video frame position of the video file;
获取并标记发生的所述音频事件中声源相对拾音器的方向信息和距离信息中的一项或多项;Acquiring and marking one or more of direction information and distance information of the sound source relative to the pickup in the audio event that occurs;
获取并标记发生的所述音频事件对应的严重度级别;Obtaining and marking the severity level corresponding to the audio event that occurred;
获取并标记发生的所述音频事件的名称。Gets and marks the name of the audio event that occurred.
本发明实施例还提供一种视频监控系统,包括:监测处理装置和如上任一项所述的视频标记装置;The embodiment of the invention further provides a video monitoring system, comprising: a monitoring processing device and the video marking device according to any one of the preceding claims;
所述视频标记装置,设置为:对视频监控过程中录制的视频文件进行事件标记,并在对所述视频文件完成一个事件标记后,向所述监测处理装置告警;The video tagging device is configured to: perform an event tag on the video file recorded during the video monitoring process, and notify the monitoring processing device after completing an event tag on the video file;
所述监测处理装置,设置为:在收到所述视频标记装置的告警后,将所述事件标记部分的视频文件进行告警显示。The monitoring processing device is configured to: after receiving the alarm of the video marking device, perform an alarm display on the video file of the event marking portion.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述处理器执行所述计算机可执行指令时,进行如下操作:The embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the processor executes the computer executable instructions, the following operations are performed:
提取视频文件中音频信号的声音特征;Extracting sound characteristics of the audio signal in the video file;
将提取的所述声音特征与音频事件库中每个音频事件进行匹配;所述每个音频事件基于事件发生时产生的音频信号的声音特征而建立;The extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
当所述声音特征与至少一个所述音频事件匹配成功时,在所述视频文件中对应位置对发生的所述音频事件进行事件标记。When the sound feature is successfully matched with at least one of the audio events, an event flag is generated for the audio event that occurs at a corresponding location in the video file.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述处理器执行所述计算机可执行指令时,进行如下操作:The embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, and when the processor executes the computer executable instructions, the following operations are performed:
进行监控视频录制;Perform surveillance video recording;
在所述视频录制的过程中,通过如上任一项所述的视频标记方法对录制得到的视频文件进行事件标记;In the process of recording the video, performing event marking on the recorded video file by using the video marking method according to any one of the above;
对所述视频文件完成一个事件标记后,将事件标记部分的视频文件进行告警显示。 After an event tag is completed on the video file, the video file of the event tag portion is displayed as an alarm.
本发明实施例提供的视频标记方法、装置及视频监控方法和系统,通过提取视频文件中音频信号的声音特征,将提取的声音特征与音频事件库中每个音频事件进行匹配,当提取的声音特征与至少一个音频事件匹配成功时,表明该视频文件中有该音频事件发生,在视频文件中对应位置对发生的该音频事件进行事件标记;其中,音频事件是预先基于事件发生时产生的音频信号的声音特征而建立的。本发明实施例可以通过预先设置音频事件,然后通过提取视频文件中音频信号的声音特征与每个音频事件进行匹配从而判断是否需要进行相应的标记,并不需要人工查看视频内容从而决定是否标记,对视频文件进行标记的效率和准确率都可以得到较大的提升。The video marking method, device and video monitoring method and system provided by the embodiments of the present invention, by extracting the sound characteristics of the audio signal in the video file, matching the extracted sound features with each audio event in the audio event library, when the extracted sound When the feature is successfully matched with the at least one audio event, it indicates that the audio event occurs in the video file, and the audio event occurs in the corresponding position in the video file; wherein the audio event is based on the audio generated when the event occurs in advance The sound characteristics of the signal are established. In the embodiment of the present invention, by setting an audio event in advance, and then matching the sound feature of the audio signal in the video file with each audio event to determine whether the corresponding mark needs to be performed, it is not necessary to manually view the video content to determine whether to mark. The efficiency and accuracy of marking video files can be greatly improved.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1为本发明实施例提供的一种视频标记方法的流程图;FIG. 1 is a flowchart of a video marking method according to an embodiment of the present invention;
图2为本发明实施例提供的一种视频监控方法的流程图;2 is a flowchart of a video monitoring method according to an embodiment of the present invention;
图3为本发明实施例提供的一种视频标记装置的结构示意图;3 is a schematic structural diagram of a video marking apparatus according to an embodiment of the present invention;
图4为本发明实施例提供的另一种视频标记装置的结构示意图;4 is a schematic structural diagram of another video marking apparatus according to an embodiment of the present invention;
图5为本发明实施例提供的又一种视频标记装置的结构示意图;FIG. 5 is a schematic structural diagram of still another video marking apparatus according to an embodiment of the present disclosure;
图6为本发明实施例提供的一种视频监控系统的结构示意图;FIG. 6 is a schematic structural diagram of a video monitoring system according to an embodiment of the present disclosure;
图7为本发明实施例提供的视频监控系统的一种组成架构示意图;FIG. 7 is a schematic structural diagram of a component of a video monitoring system according to an embodiment of the present disclosure;
图8为本发明实施例提供的又一种视频监控方法的流程图。FIG. 8 is a flowchart of still another video monitoring method according to an embodiment of the present invention.
详述Detailed
下文中将结合附图对本发明的实施方式进行详细说明。需要说明的是,在不冲突的情况下,本文中的实施例及实施例中的特征可以相互任意组合。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments herein may be arbitrarily combined with each other.
在附图的流程图示出的步骤可以在诸根据一组计算机可执行指令的计算机系统中执行。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。 The steps illustrated in the flowchart of the figures may be executed in a computer system in accordance with a set of computer executable instructions. Also, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
以下描述的实施例只是本发明中一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The embodiments described below are only some of the embodiments of the present invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例通过预先设置音频事件,然后提取待处理的视频文件中音频信号的声音特征,将该声音特征与每个音频事件进行匹配从而自动判断是否需要进行相应的标记,并不需要人工查看视频内容从而决定是否进行标记,可以大大提升对视频文件进行标记的效率和准确率。如图1所示,为本发明实施例提供的一种视频标记方法的流程图,本发明实施例提供的视频标记方法可以包括如下步骤,即S101~S105:In the embodiment of the present invention, the audio event is set in advance, and then the sound feature of the audio signal in the video file to be processed is extracted, and the sound feature is matched with each audio event to automatically determine whether the corresponding mark needs to be performed, and does not need to be manually viewed. The video content thus determines whether or not to mark, which can greatly improve the efficiency and accuracy of marking video files. As shown in FIG. 1 , it is a flowchart of a video marking method according to an embodiment of the present invention. The video marking method provided by the embodiment of the present invention may include the following steps, namely, S101 to S105:
S101:提取视频文件中音频信号的声音特征。S101: Extract sound characteristics of the audio signal in the video file.
本发明实施例中的视频文件可以包括音频信号和同步录制的视频信号。可选地,本发明实施例中的S101可以在等视频文件录制结束后进行,也可以在视频文件录制过程中进行,在视频文件录制过程进行时,可以提升对视频文件进行标记处理的及时性,即可以实现实时标记,这对于一些领域,尤其是视频监控领域,早一分钟或早一秒看到标记的告警内容,对于后续告警事件态势的影响可能都会很不一样。因此,在视频文件录制过程中执行S101对于视频监控领域有着重大意义。The video file in the embodiment of the present invention may include an audio signal and a synchronously recorded video signal. Optionally, the S101 in the embodiment of the present invention may be performed after the video file is recorded, or during the video file recording process, and the time for marking the video file may be improved during the video file recording process. Real-time marking can be realized. For some areas, especially in the field of video surveillance, the marked alarm content is seen one minute or one second earlier, and the impact on the subsequent alarm event situation may be very different. Therefore, the implementation of S101 during video file recording is of great significance for the field of video surveillance.
S102:将提取的声音特征与音频事件库中每个音频事件进行匹配。S102: Match the extracted sound features with each audio event in the audio event library.
本发明实施例中的每个音频事件基于事件发生时产生的音频信号的声音特征而建立。例如,对于烟雾报警事件,则会产生烟雾报警器的声音,提取该声音的声音特征则可得到一个烟雾报警的音频事件。又例如,对于一起抢劫或暴力或侵略性事件,对应的可能会有大喊救命等求救声,提取这些声音的声音特征则可得到一个抢劫或暴力或侵略性事件的音频事件。除了上述示例,对于不同事件发生一般有该事件对应的一种或多种声音特征的产生,例如,枪击事件会对应的有枪声,而在其他的一些事件中则可能会对应产生例如玻璃破碎音声、哭喊声、汽车喇叭声等等。本发明实施例对此不再赘述。本领域技术人员应当理解,本发明实施例中的音频事件可以根据实际应用中的需求灵活配置。Each audio event in an embodiment of the invention is established based on the acoustic characteristics of the audio signal produced at the time the event occurred. For example, for a smoke alarm event, a smoke alarm sound is generated, and the sound characteristics of the sound are extracted to obtain an audio event of the smoke alarm. For another example, for a robbery or violent or aggressive incident, there may be a call for help, such as screaming for help, and extracting the sound characteristics of these sounds may result in an audio event of a robbery or violent or aggressive event. In addition to the above examples, there are generally one or more sound features corresponding to the event for different events, for example, a gunshot event may correspond to a gunshot sound, and in other events, a corresponding glass breakage may occur. Sound, crying, car horn, etc. This embodiment of the present invention will not be described again. Those skilled in the art should understand that the audio events in the embodiments of the present invention can be flexibly configured according to the requirements in practical applications.
S103:判断提取的声音特征是否与至少一个音频事件匹配成功;当判断 出匹配时,执行S104;当判断出不匹配时,执行S105;S103: determining whether the extracted sound feature is successfully matched with at least one audio event; When the matching is performed, S104 is performed; when it is determined that the matching is not performed, S105 is performed;
S104:在视频文件中对应位置对发生的该音频事件进行事件标记。S104: Perform event tagging on the audio event that occurs in the corresponding position in the video file.
S105:当前无音频事件发生,可以继续等待下一次检测。S105: The current no audio event occurs, and can continue to wait for the next detection.
本发明实施例通过图1所示的流程,通过预先的设置音频事件,并通过自动提取视频文件中声音信号的声音特征完成与每个音频事件匹配,若提取的声音特征与其中一个音频事件匹配成功,则代表视频文件中发生了该音频事件,自动在视频文件的相应位置对该音频事件进行事件标记。上述判断以及标记过程完全不需要人工参与,既能提升效率,又能提升标记的准确率。The embodiment of the present invention completes matching with each audio event by presetting the audio event and automatically extracting the sound features of the sound signal in the video file through the flow shown in FIG. 1, if the extracted sound feature matches one of the audio events. If successful, the audio event occurs in the video file, and the audio event is automatically marked in the corresponding position of the video file. The above judgment and marking process do not require manual participation at all, which can improve efficiency and improve the accuracy of marking.
可选地,在本发明实施例中,上述过程还可以在视频录制过程中进行,在视频录制过程中,可以实现对录制的视频文件完成实时的标记处理,相对相关技术中需在视频文件录制完成后事后进行标记处理的方式,可以大大提升标记处理的时效,这对于视频监控领域,能及时制止一些恶意事件或甚至可避免一些恶意事件的发生,保证用户的财产和生命安全。Optionally, in the embodiment of the present invention, the foregoing process may also be performed in a video recording process, and in the video recording process, real-time marking processing may be performed on the recorded video file, and the related file needs to be recorded in the video file. After the completion of the mark processing method, the time limit of the mark processing can be greatly improved. For the video surveillance field, some malicious events can be stopped in time or even some malicious events can be avoided to ensure the security of the user's property and life.
可选地,在本发明实施例中,提取音频信号的声音特征,并将提取的声音特征与每个音频事件进行匹配的实现方式,可以采用以下过程进行:Optionally, in the embodiment of the present invention, the implementation of extracting the sound feature of the audio signal and matching the extracted sound feature with each audio event may be performed by using the following process:
将音频信号变换到时频域后,提取该音频信号的背景信号和前景信号,例如可以通过基于人类听力的神经机理的行为建模分离出背景信号和前景信号;这样可以消除背景信号的影响。After transforming the audio signal into the time-frequency domain, the background signal and the foreground signal of the audio signal are extracted. For example, the background signal and the foreground signal can be separated by behavioral modeling based on the neural mechanism of human hearing; this can eliminate the influence of the background signal.
从前景信号中提取声音特征集合,然后从音频事件库中读取每个音频事件,计算提取的声音特征集合与每个音频事件的相似度,当该声音特征集合与某一音频事件的相似度大于设定的相似度阈值时,判定与该音频事件匹配成功,也即判定视频文件中发生了该音频事件。随后,可以在视频文件中对应位置对发生的音频事件进行事件标记。Extracting a set of sound features from the foreground signal, and then reading each audio event from the audio event library, calculating a similarity between the extracted sound feature set and each audio event, when the sound feature set is similar to an audio event When the value is greater than the set similarity threshold, it is determined that the audio event is successfully matched, that is, the audio event occurs in the video file. Event events can then be flagged for the audio event that occurs in the video file.
可选地,在本发明实施例中,对视频文件中对应位置对发生的音频事件进行事件标记,包括进行以下标记中的一种或多种:Optionally, in the embodiment of the present invention, performing event marking on the audio event that occurs in the corresponding position in the video file includes performing one or more of the following:
在视频文件的关键视频帧位置标记音频事件发生的开始时间;本发明实施例中的关键视频帧是指在检测出发生音频事件的那一时刻的视频帧。The start time of the occurrence of the audio event is marked at the key video frame position of the video file; the key video frame in the embodiment of the present invention refers to the video frame at the moment when the audio event occurs.
获取并标记发生的音频事件中声源相对拾音器的方向信息和距离信息 中的一项或多项;本发明实施例中在进行声源定位时,可以在得到明确的声源方向(也可用角度表征)和/或距离后才进行事件标记,以避免不清楚的声源信息反而导致后续的处理的误导。Acquire and mark the direction information and distance information of the sound source relative to the pickup in the audio event that occurred In the embodiment of the present invention, when the sound source is positioned, the event mark can be performed after obtaining a clear sound source direction (which can also be characterized by an angle) and/or a distance to avoid unclear sound. The source information instead leads to misleading subsequent processing.
获取并标记发生的音频事件对应的严重度级别。Gets and marks the severity level corresponding to the audio event that occurred.
获取并标记发生的音频事件的名称。Gets and marks the name of the audio event that occurred.
可选地,在本发明实施例中,针对每一个音频事件都可以设置对应的严重度级别,例如,对于一个音频事件,其为非恶性和非危险事件时,可以设置其严重度级别为一般严重,用系数0表征;对于一个为恶性音频事件,设置其严重度级别为较为严重,用系数1表征;对于一个为危险音频事件,设置其严重度级别为很严重,用系数2表征。在进行严重度级别标记时,可以直接标记每个严重度级别对应的系数。Optionally, in the embodiment of the present invention, a corresponding severity level may be set for each audio event. For example, for an audio event, when it is a non-malignant and non-hazardous event, the severity level may be set to be a general Severe, characterized by a coefficient of 0; for a vicious audio event, set its severity level to be more severe, characterized by a factor of 1; for a dangerous audio event, set its severity level to be very severe, characterized by a factor of 2. When you perform a severity level tag, you can directly mark the coefficient for each severity level.
对于一个事件,其严重度级别可能不仅与事件本身相关,可能还与事件发生的位置(例如酒店、商铺、学校、住宅区、家庭内)以及事件持续的时间紧密关联。For an event, its severity level may not only be related to the event itself, but may also be closely related to where the event occurred (eg, hotels, shops, schools, residential areas, within the home) and the duration of the event.
可选地,在本发明实施例中,在获取音频事件对应的严重度级别的实现方式,可以包括:根据录制视频文件的位置信息(也即音频事件发生的位置)和/或该音频事件发生后的持续时间,确定该音频事件对应的严重度级别;即结合获取到的位置信息和/或持续时间确定该音频事件对应的严重度级别,这样确定的严重度级别考虑的因素更为全面,得到的结果也更为准确。Optionally, in the embodiment of the present invention, the implementation manner of obtaining the severity level corresponding to the audio event may include: according to the location information of the recorded video file (that is, the location where the audio event occurs) and/or the audio event occurs. The duration of the subsequent determination of the severity level corresponding to the audio event; that is, determining the severity level corresponding to the audio event in combination with the acquired location information and/or duration, such that the determined severity level considers a more comprehensive factor. The results obtained are also more accurate.
可选地,在本发明实施例中,为了便于后续监测人员查看和处理,在视频文件中对应位置对音频事件进行事件标记的实现方式,可以包括:对音频事件对应的严重度级别进行标记时,还可以根据预先设置的严重度级别与标记格式的对应关系表,选择与该音频事件的严重度级别对应的标记格式进行标记。可选地,本发明实施例中的标记格式包括但不限于文字框和文字采用的颜色/格式不同。下面以一个示例进行说明。参见表1所示。Optionally, in the embodiment of the present invention, in order to facilitate the subsequent monitoring personnel to view and process, the implementation of the event marking of the audio event in the corresponding position in the video file may include: marking the severity level corresponding to the audio event The tag format corresponding to the severity level of the audio event may be selected according to a preset correspondence table between the severity level and the tag format. Optionally, the mark format in the embodiment of the present invention includes, but is not limited to, a different color/format adopted by the text box and the text. The following is an example. See Table 1.
表1Table 1
Figure PCTCN2017086325-appb-000001
Figure PCTCN2017086325-appb-000001
Figure PCTCN2017086325-appb-000002
Figure PCTCN2017086325-appb-000002
基于上述对应关系表针对不同严重度级别的音频事件采用不同的标记格式进行标记,后续在对标记部分的视频文件进行显示时,对该标记就能以不同的格式进行显示,对于用户来说就起到不同的提示作用,更加利于用户针对不同严重度级别的事件作出正确、快速的反应。Based on the above correspondence table, the audio events of different severity levels are marked with different mark formats, and when the video files of the mark parts are displayed, the mark can be displayed in different formats, for the user Play different prompts, which is more conducive to users to respond correctly and quickly to events of different severity levels.
可选地,在本发明实施例中,对视频文件进行标记的判断过程可以是周期性执行的,对此,本发明实施例可以预先设置一个检测周期,例如10秒,也即每隔10秒检测一次;也可以设置为30秒,1分钟等等;在实际应用中,检测周期的取值可以根据实际需求灵活设定。这样每一个检测周期内,都会从录制的视频文件中提取音频信号及其声音特征,并将提取声音特征与每个音频事件进行匹配。这样,就会存在以下情况:例如,假设发生一起抢劫事件,其中恐吓声、哀求声、哭嚎声、求救声持续若干秒,且中间无明显的长时间的中断,也可能一段时间内持续发生的是一个音频事件或相关联的音频事件。因此,本发明实施例可以设置事件合并规则,对于上述类似情况可以整合成一个音频事件,提升检测、标记的智能性和准确性。例如,该合并规则可以设置为以下规则中的任意一种:Optionally, in the embodiment of the present invention, the process of marking the video file may be performed periodically. For this, the embodiment of the present invention may preset a detection period, for example, 10 seconds, that is, every 10 seconds. It can be detected once; it can also be set to 30 seconds, 1 minute, etc. In practical applications, the value of the detection period can be flexibly set according to actual needs. In this way, the audio signal and its sound characteristics are extracted from the recorded video file during each detection cycle, and the extracted sound features are matched with each audio event. In this way, there will be the following situations: for example, suppose a robbery occurs, in which the threat of intimidation, crying, crying, and crying for several seconds, and no obvious long interruption in the middle, may continue to occur for a period of time. Is an audio event or associated audio event. Therefore, the embodiment of the present invention can set an event merging rule, which can be integrated into an audio event for the above similar situation, and improves the intelligence and accuracy of detection and marking. For example, the merge rule can be set to any of the following rules:
检测为相同的音频事件时进行合并(不限制持续时间);Consolidation when detecting the same audio event (without limiting duration);
在M个音频检测周期内,检测为相同的音频事件时进行合并,M大于等于2;例如假设检测周期为1分钟,M取10,则允许10分钟内相同的音频事件合并;In the M audio detection periods, when the same audio event is detected, the combination is performed, and M is greater than or equal to 2; for example, if the detection period is 1 minute and M is 10, the same audio event is allowed to be merged within 10 minutes;
检测为相关联的音频事件时进行合并(不限制持续时间);Merging when detected as an associated audio event (without limiting duration);
在N个音频检测周期内,检测为相关联的音频事件时进行合并,N大于等于2。During the N audio detection periods, the combination is detected as an associated audio event, and N is greater than or equal to 2.
在本发明实施例的一个应用场景中,当在某一个检测周期内第一次检测到某一音频事件发生时,在视频文件中先标记该事件发生的开始时间,对于结束时间可以先不标记,等下一个检测周期的检测结果,如果下一检测周期 没有检测到音频事件发生,或者没有检测到相同或相关联的音频事件发生,则标记该音频事件的结束时间为该下一检测周期的开始时间。In an application scenario of the embodiment of the present invention, when an audio event is detected for the first time in a certain detection period, the start time of the event is marked in the video file, and the end time may not be marked first. , wait for the next detection cycle, if the next detection cycle If no audio event is detected, or no identical or associated audio event is detected, the end time of the audio event is marked as the start time of the next detection cycle.
在本发明实施例的另一个应用场景中,在相邻检测周期所提取的音频信号的声音特征都与至少一个音频事件匹配成功时,也即在两个相邻检测周期内都有音频事件发生时,可以根据以上预设的事件合并规则判断是否对相邻两个检测周期内发生的音频事件进行合并;当判断出进行合并时,将前一个检测周期内的音频事件的开始时间作为当前检测周期内音频事件的开始时间,结束时间先不标记,等待后续检测结果;当判断出不进行合并时,设置前一个检测周期内的音频事件的结束时间为当前检测周期的开始时间,设置当前检测周期内的音频事件的开始时间为当前检测周期的开始时间。In another application scenario of the embodiment of the present invention, when the sound features of the audio signals extracted in the adjacent detection period are successfully matched with the at least one audio event, that is, the audio events occur in two adjacent detection periods. According to the above preset event merging rule, it is determined whether to merge the audio events occurring in two adjacent detection periods; when it is determined that the merging is performed, the start time of the audio event in the previous detection period is used as the current detection. The start time of the audio event in the period, the end time is not marked first, waiting for the subsequent detection result; when it is judged that the combination is not performed, the end time of the audio event in the previous detection period is set as the start time of the current detection period, and the current detection is set. The start time of the audio event in the cycle is the start time of the current detection cycle.
在合并后,对于音频事件的持续时间发生了变化,因此该音频事件对应的严重度等级也可能会发生变化,可选地,本发明实施例提供的方法,在对发生的音频事件进行合并后,可重新获取该音频事件对应的严重度等级是否发生变化,当发生变化时,可以进行对应的更新。因此,在本发明实施例中,对视频文件进行的标记还可以包括音频事件的结束时间和/或持续时间。而对于开始时间和结束时间这一段的视频文件可以称为标签视频。在后续报警显示时,可以针对性的针对这一段标签视频进行显示。After the merging, the duration of the audio event is changed, so the severity level corresponding to the audio event may also change. Optionally, the method provided by the embodiment of the present invention combines the generated audio events. The re-acquisition of the severity level corresponding to the audio event may be re-acquired, and when the change occurs, the corresponding update may be performed. Thus, in an embodiment of the invention, the marking of the video file may also include the end time and/or duration of the audio event. The video file for the start time and end time segments can be referred to as a tag video. In the subsequent alarm display, the video of this label can be displayed in a targeted manner.
如图2所示,为本发明实施例提供的一种视频监控方法的流程图。在图1所示实施例提供的视频标记方法的基础上,本发明实施例提供的视频监控方法,可以包括如下步骤,即S201~S203:FIG. 2 is a flowchart of a video monitoring method according to an embodiment of the present invention. The video monitoring method provided by the embodiment of the present invention may include the following steps, that is, S201 to S203:
S201:进行监控视频录制;在实际应用中,可以通过图像采集器(例如摄像头)和拾音器进行视频采集和同步的音频采集。S201: Perform monitoring video recording; in practical applications, video acquisition and synchronized audio collection can be performed by an image collector (such as a camera) and a pickup.
S202:在视频录制过程中,通过如图1所示任一实施例中的视频标记方法对录制得到的视频文件进行事件标记。S202: During the video recording process, the recorded video file is event-marked by a video marking method in any of the embodiments shown in FIG.
S203:对视频文件完成一个事件标记后,将事件标记部分的视频文件进行告警显示。S203: After completing an event tag on the video file, the video file of the event tag part is displayed in an alarm.
通过本发明实施例提供的种视频监控方法,监控人员就可以最及时的查 看到发生了音频事件部分的视频内容,并及时作出对应的处理。需要注意的是,根据上述实施例中描述的内容可知,该事件可能仍在发生,尚未结束;也可能事件已经结束,这根据该事件持续时间、事件检测周期而定。Through the video monitoring method provided by the embodiment of the present invention, the monitoring personnel can check the most timely I saw the video content of the audio event part and made the corresponding processing in time. It should be noted that, according to the content described in the above embodiments, the event may still occur, has not ended yet; or the event may have ended, depending on the event duration and the event detection period.
可选地,在本发明实施例中,将标记部分的视频文件进行告警显示的实现方式,例如可以向后台服务器发送告警。如果对应显示设备之前显示的本来就是本发明实施例中图像采集器的实时视频,则该显示设备当前显示的就是标记部分的视频内容,且对应的事件标记会进行相应的显示。如果对应显示设备当前显示的不是本发明实施例中图像采集器的实时视频,可以向该显示设备发送告警消息和事件标记部分的视频链接,用户通过点击视频链接后即可播放这部分视频,且本发明实施例中还可以提供随时切换到实时视频的功能。Optionally, in the embodiment of the present invention, the implementation of the alarm display of the video file of the marked part, for example, may send an alarm to the background server. If the real-time video of the image collector in the embodiment of the present invention is displayed before the display device, the video content of the marked portion is currently displayed by the display device, and the corresponding event flag is displayed correspondingly. If the corresponding display device is not displaying the real-time video of the image collector in the embodiment of the present invention, the video link of the alarm message and the event tag portion may be sent to the display device, and the user may play the video by clicking the video link, and The function of switching to real-time video at any time can also be provided in the embodiment of the present invention.
可选地,在本发明实施例中,由于对于发生的音频事件(例如抢劫、枪击等等)可能需要进行对应的报警处理。因此,本发明实施例在显示设备上显示时,还可以提供报警选项栏,该报警选项栏可以与视频进度条上的时间点标记集成在一起,当用户点击后即可弹出报警选项。同时考虑到对于重点事件用户需要多次查看以进行确定,因此,本发明实施例还可以提供回看功能并也可以集成在视频进度条上的某一位置(例如可用相应的标记体现,例如集成在时间点标记上),用户需要回看只要点击相应位置的标识即可。Alternatively, in an embodiment of the invention, corresponding alarm processing may be required for audio events that occur (eg, robbery, shooting, etc.). Therefore, when displaying on the display device, the embodiment of the present invention can also provide an alarm option bar, which can be integrated with the time point mark on the video progress bar, and the alarm option can be popped up when the user clicks. At the same time, it is considered that the user needs to view multiple times for the key event to make the determination. Therefore, the embodiment of the present invention can also provide a lookback function and can also be integrated in a certain position on the video progress bar (for example, the corresponding mark can be embodied, for example, integrated. On the time point mark), the user needs to look back and click on the logo of the corresponding location.
通过本发明实施例提供的技术方案,可以在监控过程中及时、准确的查看到发生的音频事件,并作出及时准确的反应,保证了用户的财产和生命的安全。Through the technical solution provided by the embodiment of the present invention, the audio event occurring can be timely and accurately observed in the monitoring process, and a timely and accurate response can be made to ensure the security of the user's property and life.
如图3所示,为本发明实施例提供的一种视频标记装置的结构示意图。本发明实施例提供的视频标记装置30,可以包括:特征提取模块31、处理模块32和标记模块33。FIG. 3 is a schematic structural diagram of a video marking apparatus according to an embodiment of the present invention. The video marking device 30 provided by the embodiment of the present invention may include: a feature extraction module 31, a processing module 32, and a marking module 33.
其中,特征提取模块31,设置为:提取视频文件中音频信号的声音特征。The feature extraction module 31 is configured to: extract sound features of the audio signal in the video file.
本发明实施例中的视频文件可以包括音频信号和同步录制的视频信号。The video file in the embodiment of the present invention may include an audio signal and a synchronously recorded video signal.
处理模块32,设置为:将特征提取模块31提取的声音特征与音频事件库 中每个音频事件进行匹配。The processing module 32 is configured to: the sound feature and the audio event library extracted by the feature extraction module 31 Each audio event is matched.
本发明实施例中的每个音频事件基于事件发生时产生的音频信号的声音特征而建立。对于不同事件发生一般有该事件对应的一种或多种声音特征的产生,例如,枪击事件会对应的有枪声,而在其他的一些事件中则可能会对应产生例如玻璃破碎音声、哭喊声、汽车喇叭声等等。本发明实施例对此不再赘述。本领域技术人员应当理解,本发明实施例中的音频事件可以根据实际应用中的需求灵活配置。Each audio event in an embodiment of the invention is established based on the acoustic characteristics of the audio signal produced at the time the event occurred. For different events, there are generally one or more sound features corresponding to the event. For example, a gunshot event may have a gunshot sound, while in other events, a glass break sound may be generated correspondingly, crying. Sound, car horn, and so on. This embodiment of the present invention will not be described again. Those skilled in the art should understand that the audio events in the embodiments of the present invention can be flexibly configured according to the requirements in practical applications.
标记模块33,设置为:当处理模块32的处理结果为声音特征与至少一个音频事件匹配成功时,在视频文件中对应位置对发生的该音频事件进行事件标记。The marking module 33 is configured to: when the processing result of the processing module 32 is that the sound feature is successfully matched with the at least one audio event, the corresponding event in the video file is event-marked for the audio event that occurs.
可选地,本发明实施例中的特征提取模块31、处理模块32和标记模块33的上述功能可以通过处理器实现,也可以各自独立构造实现。特征提取模块31自动提取视频文件中声音信号的声音特征经由处理模块32完成与每个音频事件的匹配,标记模块33在有匹配成功的音频事件时(代表视频文件中发生了该音频事件),自动在视频文件的相应位置进行事件标记。整个过程不需要人工参与,标记效率和准确率都能得到较大的保证。Optionally, the foregoing functions of the feature extraction module 31, the processing module 32, and the marking module 33 in the embodiment of the present invention may be implemented by a processor, or may be implemented independently. The feature extraction module 31 automatically extracts the sound features of the sound signal in the video file to complete the matching with each audio event via the processing module 32. When there is a matching successful audio event (representing the audio event in the video file), Event markers are automatically made at the appropriate location in the video file. The entire process does not require manual participation, and the marking efficiency and accuracy can be greatly guaranteed.
如图4所示,为本发明实施例提供的另一种视频标记装置的结构示意图。在图3所示装置的结构基础上,本发明实施例中的视频标记装置30还可包括:FIG. 4 is a schematic structural diagram of another video marking apparatus according to an embodiment of the present invention. Based on the structure of the device shown in FIG. 3, the video tagging device 30 in the embodiment of the present invention may further include:
视频录制模块34,设置为:进行视频录制。The video recording module 34 is configured to: perform video recording.
可选地,该视频录制模块34可以包括视频采集器和拾音器。即视频标记装置30本身就可以作为一种监控设备,其与监控平台配合即可完成各种场景下的视频监控。Optionally, the video recording module 34 can include a video capture device and a pickup. That is, the video tagging device 30 itself can be used as a monitoring device, which can cooperate with the monitoring platform to complete video monitoring in various scenarios.
特征提取模块31,设置为:在用于在视频录制模块34进行视频录制过程中,提取视频文件中音频信号的声音特征,然后经由处理模块32和标记模块完成后续的标记处理流程。这样可以提升对视频文件进行标记的处理的及时性,基本可以实现实时标记,这对于视频监控领域,早一分钟或早一秒看到标记的告警内容,对于后续告警事件态势的影响可能都会很不一样。The feature extraction module 31 is configured to: during the video recording process performed by the video recording module 34, extract the sound features of the audio signal in the video file, and then complete the subsequent marking process flow via the processing module 32 and the marking module. This can improve the timeliness of the processing of marking the video file, and basically realize real-time marking. For the video monitoring field, the marked alarm content is seen one minute or one second earlier, and the impact on the subsequent alarm event situation may be very Different.
可选地,在本发明实施例中,特征提取模块31提取音频信号的声音特征, 以及处理模块32将提取的声音特征与每个音频事件进行匹配,可以采用以下过程进行:Optionally, in the embodiment of the present invention, the feature extraction module 31 extracts a sound feature of the audio signal. And the processing module 32 matches the extracted sound features with each audio event, which can be performed by the following process:
将音频信号变换到时频域后,特征提取模块31提取该音频信号的背景信号和前景信号,从前景信号中提取声音特征集合。特征提取模块31可以通过基于人类听力的神经机理的行为建模分离出背景信号和前景信号;这样可以消除背景信号的影响。After transforming the audio signal into the time-frequency domain, the feature extraction module 31 extracts the background signal and the foreground signal of the audio signal, and extracts the sound feature set from the foreground signal. The feature extraction module 31 can separate the background signal and the foreground signal by behavioral modeling based on the neural mechanism of human hearing; this can eliminate the influence of the background signal.
处理模块32从音频事件库中读取音频事件,计算特征提取模块31提取的声音特征集合与每个音频事件的相似度,当该声音特征集合与某一音频事件的相似度大于设定的相似度阈值时,判定与该音频事件匹配成功,也即判定视频文件中发生了该音频事件。随后,可以在视频文件中对应位置对发生的音频事件进行事件标记。The processing module 32 reads the audio event from the audio event library, and calculates the similarity between the sound feature set extracted by the feature extraction module 31 and each audio event, when the similarity between the sound feature set and an audio event is greater than the set similarity. When the threshold is reached, it is determined that the audio event is successfully matched, that is, the audio event is determined to have occurred in the video file. Event events can then be flagged for the audio event that occurs in the video file.
可选地,在本发明实施例中,标记模块33在视频文件中对应位置对发生的音频事件进行事件标记,包括进行以下标记中的一种或多种:Optionally, in the embodiment of the present invention, the marking module 33 performs event marking on the generated audio event in the corresponding position in the video file, including performing one or more of the following markings:
在视频文件的关键视频帧位置标记音频事件发生的开始时间;本发明实施例中的关键视频帧是指在检测出发生音频事件的那一时刻的视频帧。The start time of the occurrence of the audio event is marked at the key video frame position of the video file; the key video frame in the embodiment of the present invention refers to the video frame at the moment when the audio event occurs.
获取并标记发生的音频事件中声源相对拾音器的方向信息和距离信息中的一项或多项;本发明实施例中在进行声源定位时,可以在得到明确的声源方向(也可用角度表征)和/或距离后才进行事件标记,以避免不清楚的声源信息反而导致后续的处理的误导。Obtaining and marking one or more of the direction information and the distance information of the sound source relative to the pickup in the generated audio event; in the embodiment of the present invention, when the sound source is positioned, a clear sound source direction can be obtained (the angle can also be used) Event marking is performed after characterization and/or distance to avoid unclear sound source information and otherwise lead to misleading subsequent processing.
获取并标记发生的音频事件对应的严重度级别。Gets and marks the severity level corresponding to the audio event that occurred.
获取并标记发生的音频事件的名称。Gets and marks the name of the audio event that occurred.
可选地,在本发明实施例中,针对每一个音频事件都可以设置对应的严重度级别,例如,对于一个音频事件,其为非恶性和非危险事件时,可以设置其严重度级别为一般严重,用系数0表征;对于一个为恶性音频事件,设置其严重度级别为比较严重,用系数1表征;对于一个为危险音频事件,设置其严重度级别为很严重,用系数2表征。在进行严重度级别标记时,可以直接标记每个严重度级别对应的系数。Optionally, in the embodiment of the present invention, a corresponding severity level may be set for each audio event. For example, for an audio event, when it is a non-malignant and non-hazardous event, the severity level may be set to be a general Severe, characterized by a coefficient of 0; for a vicious audio event, set its severity level to be more severe, characterized by a factor of 1; for a dangerous audio event, set its severity level to be severe, characterized by a factor of 2. When you perform a severity level tag, you can directly mark the coefficient for each severity level.
对于一个事件,其严重度级别可能不仅于事件本身相关,可能还与事件 发生的位置(例如酒店、商铺、学校、住宅区、家庭内)以及事件持续的时间紧密关联。For an event, its severity level may not only be related to the event itself, but may also be related to the event. Locations that occur (such as hotels, shops, schools, residential areas, homes) and the duration of events are closely related.
可选地,在本发明实施例中,在获取音频事件对应的严重度级别的实现方式,可以包括:根据录制视频文件的位置信息(也即音频事件发生的位置)和/或该音频事件发生后的持续时间,确定该音频事件对应的严重度级别;即结合获取到的位置信息和/或持续时间确定该音频事件对应的严重度级别,这样确定的严重度级别考虑的因素更为全面,得到的结果也更为准确。Optionally, in the embodiment of the present invention, the implementation manner of obtaining the severity level corresponding to the audio event may include: according to the location information of the recorded video file (that is, the location where the audio event occurs) and/or the audio event occurs. The duration of the subsequent determination of the severity level corresponding to the audio event; that is, determining the severity level corresponding to the audio event in combination with the acquired location information and/or duration, such that the determined severity level considers a more comprehensive factor. The results obtained are also more accurate.
可选地,在本发明实施例中,为了便于后续监测人员查看和处理,在视频文件中对应位置对音频事件进行事件标记的实现方式,可以包括:对音频事件对应的严重度级别进行标记时,还可以根据预先设置的严重度级别与标记格式的对应关系表,选择与该音频事件的严重度级别对应的标记格式进行标记。可选地,本发明实施例中的标记格式包括但不限于文字框和文字采用的颜色/格式不同。Optionally, in the embodiment of the present invention, in order to facilitate the subsequent monitoring personnel to view and process, the implementation of the event marking of the audio event in the corresponding position in the video file may include: marking the severity level corresponding to the audio event The tag format corresponding to the severity level of the audio event may be selected according to a preset correspondence table between the severity level and the tag format. Optionally, the mark format in the embodiment of the present invention includes, but is not limited to, a different color/format adopted by the text box and the text.
可选地,图5为本发明实施例提供的又一种视频标记装置的结构示意图。在图4所示装置的结构基础上,本发明实施例提供的视频标记装置30还可以包括:Optionally, FIG. 5 is a schematic structural diagram of still another video marking apparatus according to an embodiment of the present invention. Based on the structure of the device shown in FIG. 4, the video tagging apparatus 30 provided by the embodiment of the present invention may further include:
缓存模块35,设置为:存储音频事件库,该音频事件库可以从其他服务器(例如监控服务器)上获取,也可以直接接收用户的设置获取。缓存模块35还设置为:缓存视频录制模块34采集的音频数据和视频数据,以及缓存标记模块33所标记的各种数据。The cache module 35 is configured to: store an audio event library, which may be obtained from another server (for example, a monitoring server), or directly receive the user's setting acquisition. The cache module 35 is further configured to: cache audio data and video data collected by the video recording module 34, and various data marked by the cache tag module 33.
可选地,在本发明实施例中,对视频文件进行标记的判断过程可以是周期性执行的,对此,本发明实施例可以预先设置一个检测周期,例如10秒,也即每隔10秒检测一次;也可以设置为30秒,1分钟等等;在实际应用中,检测周期的取值可以根据实际需求灵活设定。这样每一个检测周期内,特征提取模块31都会从录制的视频文件中提取音频信号及其声音特征,处理模块32将提取的声音特征与每个音频事件进行匹配。这样,这就会存在多个检测周期内匹配到相同或相关联的音频事件。因此,本发明实施例可以设置事件合并规则,对于上述类似情况标记模块33可以将其整合成一个音频事件,提升检测、标记的智能性和准确性。例如,标记模块33可以根据以下合并规则 中的任意一种进行处理:Optionally, in the embodiment of the present invention, the process of marking the video file may be performed periodically. For this, the embodiment of the present invention may preset a detection period, for example, 10 seconds, that is, every 10 seconds. It can be detected once; it can also be set to 30 seconds, 1 minute, etc. In practical applications, the value of the detection period can be flexibly set according to actual needs. Thus, during each detection cycle, the feature extraction module 31 extracts the audio signal and its sound features from the recorded video file, and the processing module 32 matches the extracted sound features with each audio event. In this way, there will be matching to the same or associated audio events over multiple detection cycles. Therefore, the embodiment of the present invention can set an event merging rule, and the similarity flag module 33 can integrate it into an audio event to improve the intelligence and accuracy of detection and marking. For example, the tagging module 33 can be based on the following merge rules Process any of them:
检测为相同的音频事件时进行合并(不限制持续时间);Consolidation when detecting the same audio event (without limiting duration);
在M个音频检测周期内,检测为相同的音频事件时进行合并,M大于等于2;例如假设检测周期为2分钟,M取5,则允许10分钟内相同的音频事件合并;In the M audio detection periods, when the same audio event is detected, the combination is performed, and M is greater than or equal to 2; for example, if the detection period is 2 minutes and M is 5, the same audio event is allowed to be merged within 10 minutes;
检测为相关联的音频事件时进行合并(不限制持续时间);Merging when detected as an associated audio event (without limiting duration);
在N个音频检测周期内,检测为相关联的音频事件时进行合并,N大于等于2。During the N audio detection periods, the combination is detected as an associated audio event, and N is greater than or equal to 2.
在本发明实施例的一个应用场景中,当处理模块32在某一个检测周期内第一次检测到某一音频事件发生时,标记模块33在视频文件中先标记该事件发生的开始时间,对于结束时间可以先不标记,等下一个检测周期的检测结果,如果下一检测周期没有检测到音频事件发生,或者没有检测到相同或相关联的音频事件发生,则标记该音频事件的结束时间为该下一检测周期的开始时间。In an application scenario of the embodiment of the present invention, when the processing module 32 detects that an audio event occurs for the first time in a certain detection period, the marking module 33 first marks the start time of the event in the video file. The end time may not be marked first, waiting for the detection result of the next detection period. If no audio event is detected in the next detection period, or no identical or associated audio event is detected, the end time of marking the audio event is The start time of the next detection cycle.
在本发明实施例的另一个应用场景中,处理模块32在相邻检测周期所提取的音频信号的声音特征都与至少一个音频事件匹配成功时,也即在两个相邻检测周期内都有音频事件发生时,可以由标记模块33根据以上预设的事件合并规则判断是否对相邻两个检测周期内发生的音频事件进行合并;当判断出进行合并时,将前一个检测周期内的音频事件的开始时间作为当前检测周期内音频事件的开始时间,结束时间先不标记,等待后续检测结果;当判断出不进行合并时,设置前一个检测周期内的音频事件的结束时间为当前检测周期的开始时间,设置当前检测周期内的音频事件的开始时间为当前检测周期的开始时间。In another application scenario of the embodiment of the present invention, the processing module 32 has a sound feature of the audio signal extracted in the adjacent detection period that matches at least one audio event, that is, in two adjacent detection periods. When an audio event occurs, the marking module 33 may determine whether to merge the audio events occurring in two adjacent detection periods according to the above-mentioned preset event merging rule; when it is determined that the merging is performed, the audio in the previous detection period is The start time of the event is used as the start time of the audio event in the current detection period. The end time is not marked first, waiting for the subsequent detection result. When it is judged that the combination is not performed, the end time of the audio event in the previous detection period is set as the current detection period. The start time of the audio event in the current detection period is set to the start time of the current detection period.
在合并后,对于音频事件的持续时间发生了变化,因此该音频事件对应的严重度等级也可能会发生变化,可选地,本发明实施例中标记模块33在对发生的音频事件进行合并后,可重新获取该音频事件对应的严重度等级是否发生变化,当发生变化时,可以对标记进行相应的更新。因此,在本发明实施例中,标记模块33对视频文件进行的标记还可以包括音频事件的结束时间和/或持续时间。而对于开始时间和结束时间这一段的视频文件可以称为标签视 频。After the merging, the duration of the audio event is changed, so the severity level corresponding to the audio event may also change. Optionally, in the embodiment of the present invention, the marking module 33 merges the generated audio events. , can re-acquire whether the severity level corresponding to the audio event changes, and when the change occurs, the mark can be updated accordingly. Therefore, in the embodiment of the present invention, the marking performed by the marking module 33 on the video file may further include an end time and/or duration of the audio event. The video file for the start time and end time can be called label view. frequency.
如图6所示,为本发明实施例提供的一种视频监控系统的结构示意图。本发明实施例提供的视频监控系统60可以包括:监测处理装置61和图3到图5所示任一实施例中的视频标记装置62。FIG. 6 is a schematic structural diagram of a video monitoring system according to an embodiment of the present invention. The video monitoring system 60 provided by the embodiment of the present invention may include: a monitoring processing device 61 and a video marking device 62 in any of the embodiments shown in FIGS. 3 to 5.
其中,视频标记装置62,设置为:对视频监控过程中录制的视频文件进行事件标记,并在对视频文件完成一个事件标记后,向监测处理装置61告警;对于报警这一过程也即为事件标记激活过程,该过程可以通过视频标记装置62中设置的标记激活模块完成。The video tagging device 62 is configured to: perform event tagging on the video file recorded during the video monitoring process, and after performing an event tag on the video file, alert the monitoring processing device 61; the process of the alarm is also an event. The activation process is marked, which can be done by a tag activation module set in video tagging device 62.
监测处理装置61,设置为:在收到视频标记装置62的告警后,将事件标记部分的视频文件进行告警显示。需要注意的是,根据上述记载可知,该事件可能仍在发生,尚未结束;也可能事件已经结束,这根据该事件持续时间、检测周期等因素而定。The monitoring processing device 61 is configured to: after receiving the alarm of the video marking device 62, display the video file of the event marking portion. It should be noted that, according to the above description, the event may still occur, and has not yet ended; or the event may have ended, depending on factors such as the duration of the event, the detection period, and the like.
可选地,在本发明实施例中,监测处理装置61可以通过后台服务器结合对应的显示设备实现,后台服务器保存存储介质,用于存储音频事件库,还用于存储来自视频标记装置62发过来的视频数据、报警信息等。Optionally, in the embodiment of the present invention, the monitoring processing device 61 may be implemented by using a background server in combination with a corresponding display device, the background server storing the storage medium for storing the audio event library, and also for storing the information from the video marking device 62. Video data, alarm information, etc.
可选地,本发明实施例中,视频标记装置62还可以包括交互单元,该交互单元可以是显示单元,设置为:接收可查看来自标记模块33的标签视频以及实时视频,或者接收或下发各种交互消息。Optionally, in the embodiment of the present invention, the video tagging device 62 may further include an interaction unit, where the interaction unit may be a display unit, configured to: receive a tag video and real-time video that can be viewed from the tag module 33, or receive or deliver Various interactive messages.
可选地,本发明实施例中,视频标记装置62将事件标记部分的视频文件发给监测处理装置61进行告警。如果监测处理装置61之前显示的本来就是本发明实施例中图像采集器的实时视频,则监测处理装置61当前显示的就是标记部分的视频内容,且对应的事件标记会进行相应的显示。如果监测处理装置61当前显示的不是本发明实施例中图像采集器的实时视频,可以向监测处理装置61发送告警消息和事件标记部分的视频链接,用户点击链接后即可播放这部分视频,且本发明实施例中还可以提供随时切换到实时视频的功能。Optionally, in the embodiment of the present invention, the video marking device 62 sends the video file of the event tag portion to the monitoring processing device 61 for alarm. If the real-time video of the image collector in the embodiment of the present invention is displayed before the monitoring processing device 61, the monitoring device 61 currently displays the video content of the marked portion, and the corresponding event flag is displayed accordingly. If the monitoring processing device 61 is not displaying the real-time video of the image collector in the embodiment of the present invention, the video link of the alarm message and the event tag portion may be sent to the monitoring processing device 61, and the user may click the link to play the video. The function of switching to real-time video at any time can also be provided in the embodiment of the present invention.
可选地,在本发明实施例中,由于对于发生的音频事件(例如抢劫、枪击等等)可能需要进行对应的报警处理。因此,本发明实施例在监测处理装 置61显示时,还可以提供报警选项栏,该报警选项栏可以与视频进度条上的时间点标记集成在一起,当用户点击后即可弹出报警选项。同时考虑到对于重点事件用户需要多次查看以进行确定,因此,发明实施例还可以提供回看功能并也可以集成在视频进度条上的某一位置,用户需要回看只要点击相应位置的标识即可。Alternatively, in an embodiment of the invention, corresponding alarm processing may be required for audio events that occur (eg, robbery, shooting, etc.). Therefore, the embodiment of the present invention is in monitoring the processing equipment. When the 61 is displayed, an alarm option bar can also be provided. The alarm option bar can be integrated with the time point mark on the video progress bar, and when the user clicks, the alarm option can be popped up. At the same time, it is considered that the user needs to view multiple times for the key event to determine. Therefore, the embodiment of the invention can also provide a lookback function and can also be integrated in a certain position on the video progress bar, and the user needs to look back at the identifier of the corresponding location. Just fine.
可选地,本发明实施例中,还可以结合监测处理装置61和视频标记装置62组成监控系统,利用视频标记装置62的可对视频实现实时标记的功能,可以在监控过程中及时、准确的查看到发生的音频事件,并作出及时准确的反应,保证了用户的财产和生命的安全。Optionally, in the embodiment of the present invention, the monitoring processing device 61 and the video marking device 62 can be combined to form a monitoring system, and the video marking device 62 can realize real-time marking function on the video, which can be timely and accurate in the monitoring process. View the audio events that occur and make timely and accurate responses to ensure the safety of the user's property and life.
为了更好的理解本发明实施例提供的视频标记方法、装置及视频监控方法和系统,下面结合一个实际监控场景对本发明实施例提供的方法进行示例性说明。For a better understanding of the video marking method, device, and video monitoring method and system provided by the embodiments of the present invention, the method provided by the embodiment of the present invention is exemplarily described below in conjunction with an actual monitoring scenario.
图7为本发明实施例提供的视频监控系统的一种组成架构示意图,该视频监控系统的组成构架可以包括:FIG. 7 is a schematic structural diagram of a video surveillance system according to an embodiment of the present invention. The component of the video surveillance system may include:
摄像头与拾音器模块71(也称为监控模块或监控设备)、音频事件对象72、网络、后台服务器73(监测处理装置61的核心部分)、管理台74以及显示部件75(可以是显示器或单独的显示终端,如手机、pad等)。Camera and pickup module 71 (also referred to as monitoring module or monitoring device), audio event object 72, network, background server 73 (core portion of monitoring processing device 61), management station 74, and display component 75 (may be a display or separate Display terminals, such as mobile phones, pads, etc.).
摄像头与拾音器模块71,可以是内置拾音器的摄像头,也可以是外置拾音器的摄像头,如果是外置的则需要实现音视频同步。The camera and the pickup module 71 may be a camera with a built-in pickup or a camera with an external pickup. If it is external, audio and video synchronization is required.
摄像头与拾音器模块71还包括特征提取模块、处理模块、标记模块和缓存模块,其中,特征提取模块和处理模块,设置为:根据采集到的音频信号检测音频事件对象72;标记模块,设置为:根据检测到的音频事件对象72,获取视频事件标记的时间点标记的标记属性,并编辑实时视频,进行时间点标记,并在标签视频帧上增加醒目的文本框批注;缓存模块,设置为:缓存音频事件库、采集到的音视频信号、事件标记等。特征提取模块,设置为:先分离音频信号的前景信号和背景信号,对前景信号进行特征提取,处理模块还设置为:将前景信号与缓存模块中事件检测模型库的音频事件进行比较,若相似度超过设定阈值,则检测发生该一类或多类音频事件。接着标记 模块可进行声源定位,获取声源距离和声源方向。然后判定严重程度。处理模块还设置为:先判断是否整合音频事件,若是则整合音频事件,获取开始时刻、结束时刻和持续时长,对音频检测结论、声源角度、声源距离进行整合,重新判别严重级别。一个整合时间段内的一次音频事件只生成一个时间点标记,包含标记开始时刻和标记结束时刻。保存到缓存模块,并同步保存到后台服务器73的数据库。摄像头与拾音器模块71通过网络与后台服务器73相连。向后台服务器发送告警,若之前正在显示该摄像头的实时视频的,则继续显示,此时应显示的是带标记属性的标签视频;若之前没有显示该摄像头的实时视频的,则向该后台服务器发送告警消息和视频链接,点击后显示带标记属性的标签视频,可随时切换为实时视频。在视频进度条上出现该时间点标记,点击标记可选择报警还是回退查看,若选择报警则拨打指定报警电话并分享标注了时间和位置的标签视频。The camera and the pickup module 71 further includes a feature extraction module, a processing module, a marking module, and a cache module. The feature extraction module and the processing module are configured to: detect an audio event object 72 according to the collected audio signal; and set the marking module to: According to the detected audio event object 72, the mark attribute of the time point mark of the video event mark is acquired, and the real-time video is edited, the time point mark is marked, and the eye-catching text box annotation is added on the label video frame; the cache module is set to: Cache audio event libraries, acquired audio and video signals, event markers, and more. The feature extraction module is configured to: first separate the foreground signal and the background signal of the audio signal, and perform feature extraction on the foreground signal, and the processing module is further configured to: compare the foreground signal with an audio event of the event detection model library in the cache module, if similar If the degree exceeds the set threshold, then one or more types of audio events are detected. Mark The module can locate the sound source and obtain the sound source distance and sound source direction. Then determine the severity. The processing module is further configured to: first determine whether to integrate the audio event, and if so, integrate the audio event, obtain the start time, the end time, and the duration, and integrate the audio detection conclusion, the sound source angle, and the sound source distance, and re-determine the severity level. An audio event within an integrated time period generates only one point-in-time marker, including the marker start time and the marker end time. Saved to the cache module and synchronized to the database of the background server 73. The camera and pickup module 71 is connected to the background server 73 via a network. Send an alarm to the background server. If the real-time video of the camera is being displayed before, continue to display. At this time, the label video with the marked attribute should be displayed; if the real-time video of the camera is not displayed before, the background server is displayed. Send alarm messages and video links, click to display the tag video with tag attributes, you can switch to live video at any time. The time point mark appears on the video progress bar. Click the mark to select the alarm or roll back. If you select the alarm, dial the specified alarm call and share the tag video marked with time and location.
另外,通过管理台74(可包含音频事件管理模块、音频事件严重级别判定管理模块、合并规则管理模块)可以录入和管理特定事件的音频特征,录入和管理严重级别判定规则,以及合并规则等等。In addition, through the management station 74 (which may include an audio event management module, an audio event severity level determination management module, and a merge rule management module), audio features of specific events, entry and management of severity level determination rules, and merge rules, etc., can be entered and managed. .
下面以图7所示的视频监控的结构,结合一抢劫音频事件为例进行说明。例如:The following is a description of the structure of the video monitoring shown in FIG. 7 combined with a robbery audio event. E.g:
某住宅小区内,一男子尾随一年轻女子进电梯,手持凶器实施抢劫,女子被吓得大叫并带着哭腔说“你别过来,包给你”,持续了约1分钟,男子强行抢过包后开始翻包,趁着这个空档,女子飞快按下最近楼层的电梯键,电梯门一开飞快跑出去。由于电梯间里有摄像头和拾音器,录制了视频文件,并通过网络实时显示在小区物管保安室的监控屏幕上。利用本发明实施例提供的视频监控方法实现监控的过程如下所示:In a residential community, a man followed a young woman into the elevator and armed with a weapon to commit robbery. The woman was scared and shouted and said, "Don’t come over, pack it for you." It lasted for about 1 minute and the man forcibly robbed After the bag began to turn over, taking advantage of this space, the woman quickly pressed the elevator button on the nearest floor, and the elevator door ran out quickly. Since there are cameras and pickups in the elevator room, video files are recorded and displayed in real time on the monitoring screen of the property management room of the community through the network. The process of implementing monitoring by using the video monitoring method provided by the embodiment of the present invention is as follows:
监控设备(含摄像头与拾音器)设置在电梯内,监控设备在当前检测周期内对拾音器采集的数据经音频事件匹配检测后,检测发生的音频事件为“电梯内抢劫”。预登记该音频事件E1并记录对应的时间点标记S1的标记属性,包括标记开始时刻(即当前时刻)、严重级别、声源距离、声源方向。比如,标记开始时间:19:50:00,标记属性:抢劫|比较严重|1米以内|右上方。The monitoring device (including the camera and the pickup) is disposed in the elevator. After the monitoring device detects the data collected by the pickup through the audio event in the current detection period, the detected audio event is “in-the-bail robbery”. The audio event E1 is pre-registered and the mark attribute of the corresponding time point mark S1 is recorded, including the mark start time (ie, the current time), the severity level, the sound source distance, and the sound source direction. For example, the mark start time: 19:50:00, mark attribute: robbery | more serious | within 1 meter | upper right.
在上一检测周期没有检测到音频事件,所以不需要进行音频事件整合。 将音频事件E1进行正式登记,将从标记开始时刻到标记结束时刻的时间段内的视频称为标签视频,也可以把标签视频进行适当地时间滑动,比如将标签视频的开始时刻往前推n秒且结束时刻往后推n秒,以便更完整地了解事件发生的经过。标签视频的开始时刻是19:50:00前n秒,假设n为5,则开始时刻为19:49:55,结束时间为空,表示音频事件还在发生,没有结束。No audio events were detected during the last detection cycle, so no audio event integration is required. The audio event E1 is officially registered, and the video in the time period from the mark start time to the mark end time is called a tag video, and the tag video can be appropriately time-slided, for example, the start time of the tag video is pushed forward n Seconds and the end time is pushed back n seconds to get a more complete picture of the event. The start time of the tag video is 19 seconds before 19:50:00. If n is 5, the start time is 19:49:55, and the end time is empty, indicating that the audio event is still occurring and does not end.
向小区物管的后台服务器发送警报,在小区物管保安室内的该摄像头对应的显示屏幕上,当前视频帧的中央显示文字批注“比较严重警报:19:50:00开始,抢劫,右上方,1米以内,并采用加边框的橙色三号字体,并显示进度条,在19:50:00处有采用橙色字体的批注为“抢劫”的时间点标记并高亮显示。点击进度条上的时间点标记,将弹出两个按钮“报警”、“视频回退到标记”,若点击“报警”将按指定方式向指定报警电话进行报警并分享标签视频的链接;若点击“视频回退到标记”则视频将回退到19:49:55开始的视频,观看中可点击右键选择“观看实时视频”则视频将恢复到实时视频。Send an alarm to the background server of the cell property management tube. On the display screen corresponding to the camera in the cell management security room, the text annotation is displayed in the center of the current video frame. "Comparative alarm: 19:50:00, robbery, upper right, Within 1 meter, the orange number 3 font with border is added, and the progress bar is displayed. At 19:50:00, the annotation with orange font is marked as "robbery" and highlighted. Click on the progress bar. At the time point mark, two buttons “Alarm” and “Video Fallback to Mark” will pop up. If “Alarm” is clicked, the specified alarm will be alarmed and the link of the label video will be shared; if “Click Video” to return to the video Mark", the video will fall back to the video starting at 19:49:55, and you can right click to select "watch live video" and the video will be restored to live video.
若检测周期设置为10秒,下一检测周期开始时刻则19:50:11,仍然按照之前的步骤进行音视频采集和音频事件检测,该监控设备的拾音器采集的数据经音频事件检测后,仍检测为“抢劫”。对该音频事件进行预登记为E2,并记录当前时刻、严重级别、声源距离、声源方向。比如,标记开始时间:19:50:11,标记属性:抢劫|比较严重|1米以内|右上方。If the detection period is set to 10 seconds, the next detection period starts at 19:50:11, and the audio and video acquisition and audio event detection are still performed according to the previous steps. After the data collected by the pickup device of the monitoring device is detected by the audio event, Detection is "robbery." The audio event is pre-registered as E2, and the current time, severity level, sound source distance, and sound source direction are recorded. For example, the mark start time: 19:50:11, mark attribute: robbery | more serious | within 1 meter | upper right.
根据事件整合判定规则,对E2和E1进行事件合并。更新音频事件E1的事件标记S1的标记,例如持续时间,根据严重级别判定规则重新判定严重级别。Event consolidation is performed on E2 and E1 according to the event integration decision rule. The flag of the event flag S1 of the audio event E1 is updated, for example, the duration, and the severity level is re-determined according to the severity level determination rule.
重复上述检测过程,直到又过了1分9秒钟到19:51:09。Repeat the above test process until 1 minute 9 seconds to 19:51:09.
下一检测周期开始时刻则19:51:10,仍然按照之前的步骤进行音视频采集和音频事件检测,该摄像头的拾音器采集的数据经音频事件检测后,没有检测到任何事件。且由于上一次音频事件E1的时间点标记S1的标记结束时刻为空,所以设置S1的标记结束时刻为19:51:09。小区物管保安室里该摄像头对应的显示屏幕上播放着正常视频,视频的中央无文字批注,显示着进度条,在19:50:00处有采用橙色字体的批注为“抢劫:持续1分09秒”的时间点标记,已不再高亮显示。点击进度条上的该时间点标记,将弹出两个按钮“报警”、“视频回退到标记”,若点击“报警”将拨打指定报警电话并分享时间点标 记S1的标签视频的链接;若点击“视频回退到标记”则视频将回退到19:49:55开始的视频,观看中可点击右键选择“观看实时视频”则视频将恢复到实时视频。The start time of the next detection cycle is 19:51:10. The audio and video acquisition and audio event detection are still performed according to the previous steps. After the data collected by the pickup of the camera is detected by the audio event, no event is detected. And since the mark end time of the time point mark S1 of the last audio event E1 is empty, the mark end time of setting S1 is 19:51:09. In the cell security room, the normal video is played on the corresponding display screen of the camera. There is no text comment in the center of the video, and the progress bar is displayed. At 19:50:00, there is an annotation in orange font for "robbery: 1 minute. The 09-second time point mark is no longer highlighted. Click on the time point mark on the progress bar, and two buttons “Alarm” and “Video Fallback to Mark” will pop up. If you click “Alarm”, the specified alarm call will be dialed and the time point will be shared. Record the link of S1's tag video; if you click "Video Rewind to Tag", the video will be rolled back to the video starting at 19:49:55, and you can right click to select "watch live video" and the video will be restored to live video. .
对于以上过程,请参见图8所示,为本发明实施例提供的又一种视频监控方法的流程图,本发明实施例提供的方法可以包括如下步骤,即S801~S820:For the above process, please refer to FIG. 8 , which is a flowchart of still another video monitoring method according to an embodiment of the present invention. The method provided by the embodiment of the present invention may include the following steps, that is, S801 to S820:
S801:一个事件检测周期开始,T1时刻,按照上述对视频标记方法检测音频事件(也即进行音频事件的匹配);S801: an event detection period starts, and at time T1, an audio event is detected according to the above-mentioned video marking method (that is, matching of an audio event is performed);
S802:判断是否检测到音频事件;当判断出未检测到音频事件,执行S803;当判断出检测到音频事件,执行S805;S802: determining whether an audio event is detected; when it is determined that an audio event is not detected, executing S803; when it is determined that an audio event is detected, executing S805;
S803:判断上一次音频事件E0的标记结束时刻是否为空;当判断出为空时,执行S804;当判断出为不空时,执行S801(等待下一个事件检测周期的到来);S803: determining whether the marking end time of the last audio event E0 is empty; when it is determined that it is empty, executing S804; when it is determined that it is not empty, executing S801 (waiting for the arrival of the next event detecting period);
S804:将上一次音频事件E0的标记结束时间设置为T1的前一秒,重新判定E0的严重度级别,更新音频事件E0的事件标记S0,激活标记S0;随后执行S811;S804: set the mark end time of the last audio event E0 to the previous second of T1, re-determine the severity level of E0, update the event flag S0 of the audio event E0, activate the flag S0; then execute S811;
S805:预登记音频事件E1,记录对应的事件标记S1的开始时间T1和标记属性;S805: pre-register the audio event E1, record the start time T1 and the mark attribute of the corresponding event mark S1;
S806:判断音频事件E1是否与上一次音频事件E0整合;若整合,执行S807;若不整合,执行S808;S806: determining whether the audio event E1 is integrated with the last audio event E0; if integrated, executing S807; if not, executing S808;
S807:将音频事件E1与音频事件E0进行整合,重新判定E0的严重度级别,更新事件标记S0,删除E1,激活标记S0;随后执行S811;S807: Integrate the audio event E1 with the audio event E0, re-determine the severity level of E0, update the event flag S0, delete E1, activate the flag S0; then execute S811;
S808:判断音频事件E0的标记结束时刻是否为空;当判断出为空,则执行S809;当判断出不为空,执行S810;S808: determining whether the end time of the marking of the audio event E0 is empty; when it is judged to be empty, executing S809; when it is determined that it is not empty, executing S810;
S809:记录音频事件E0的时间点标记的结束时刻为T1的前一秒,重新判定E0的严重度级别,更新事件标记S0;随后执行S810;S809: Recording the end time of the time point mark of the audio event E0 is the previous second of T1, re-determining the severity level of E0, updating the event flag S0; then executing S810;
S810:正式登记音频事件E1,事件标记S1开始时刻为T1,结束时刻为空,激活标记E1;随后执行S811; S810: Formally register the audio event E1, the event marker S1 starts at time T1, the end time is empty, the activation flag E1; and then executes S811;
S811:判断是否正在播放该摄像头的视频;当判断出正在播放该视频,执行S814;当判断出未播放该视频,执行S812;S811: determining whether the video of the camera is being played; when it is determined that the video is being played, executing S814; when it is determined that the video is not playing, executing S812;
S812:在显示屏幕上显示告警消息和当前音频事件的标签视频链接(显示方式可以有多种,比如,在屏幕右侧区域显示并按照事件开始时间从前往后排序,若监控人员一直没有查看音频事件的标签视频链接,可能一段时间后会出现多条事件告警消息,对于多次更新标记属性的同一音频事件,其告警消息需要合并);S812: displaying the alarm message and the label video link of the current audio event on the display screen (the display manner may be various, for example, displaying in the right area of the screen and sorting according to the event start time from the time of going to the post, if the monitor has not viewed the audio all the time. The tag video link of the event may have multiple event alarm messages after a period of time. For the same audio event that updates the tag attribute multiple times, the alarm message needs to be merged);
S813;判断是否点击某音频事件的标签视频链接;如是,执行S814;若否,执行S812,继续显示以等待监控人员点击查看标签视频(或者也可以设置为当接收到严重级别为很严重的音频事件的告警消息时显示屏幕主动切换到该音频事件的标签视频);S813; determining whether to click the tag video link of an audio event; if yes, executing S814; if not, executing S812, continuing to display to wait for the monitoring personnel to click to view the tag video (or may also be set to receive audio with a severe severity) The alarm message of the event indicates that the screen actively switches to the tag video of the audio event);
S814:播放标签视频,并在屏幕上同时显示对应音频事件的事件标记属性以及起始时间,并显示进度条,在进度条上对应音频事件的时间点标记的开始时刻显示音频事件标记;S814: playing a tag video, simultaneously displaying an event tag attribute and a start time of the corresponding audio event on the screen, and displaying a progress bar, and displaying an audio event flag on a start time of the time point mark corresponding to the audio event on the progress bar;
S815:点击当前音频事件的事件标记,弹出“报警”、“视频回退到该标记”按钮;S815: Click the event tag of the current audio event, and pop up an "alarm", "video fall back to the mark" button;
S816:判断是否点击“报警”;若点击,执行S817;若不点击,执行S818;S816: judging whether to click "alarm"; if clicked, executing S817; if not clicking, executing S818;
S817:通过电话或短信或其他指定方式向指定终端报警,分享音频事件的标签视频链接;随后结束流程。S817: Alert the designated terminal by phone or SMS or other specified means, share the tag video link of the audio event; then end the process.
S818:判断是否点击“视频回退到该标记”;若点击,执行S819;若不点击,执行S816,即继续判断是否点击“报警”;S818: judging whether to click "video to fall back to the mark"; if clicked, execute S819; if not click, execute S816, and then continue to determine whether to click "alarm";
S819:播放视频回退到当前音频事件的时间点标记的标签视频;S819: playing a video of the label that is played back to the point in time of the current audio event;
S820:观看中选择了“观看实时视频”,切换到实时视频;S820: "watch live video" is selected during viewing, and switch to real-time video;
流程结束。The process ends.
通过本发明实施例提供的视频标记方法以及视频监控方法,可以在视频监控中快速定位到视频中正在发生特定行为或特定事件的瞬间,方便视频监控人员快速发现问题,提高视频监控人员的工作效率。 The video marking method and the video monitoring method provided by the embodiments of the present invention can quickly locate the moment when a specific behavior or a specific event occurs in the video in the video monitoring, so that the video monitoring personnel can quickly find the problem and improve the working efficiency of the video monitoring personnel. .
本发明的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该处理器执行计算机可执行指令时,进行如下操作,即S11~S13:Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that, when executing computer executable instructions, perform the following operations, namely, S11 to S13:
S11,提取视频文件中音频信号的声音特征;S11, extracting a sound feature of the audio signal in the video file;
S12,将提取的声音特征与音频事件库中每个音频事件进行匹配;每个音频事件基于事件发生时产生的音频信号的声音特征而建立;S12. Match the extracted sound features with each audio event in the audio event library; each audio event is established based on the sound characteristics of the audio signal generated when the event occurs;
S13,当声音特征与至少一个音频事件匹配成功时,在视频文件中对应位置对发生的该音频事件进行事件标记。S13. When the sound feature is successfully matched with the at least one audio event, the corresponding event in the video file is event-marked for the audio event that occurs.
本发明的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该处理器执行计算机可执行指令时,进行如下操作,即S21~S23:Embodiments of the present invention also provide a computer readable storage medium storing computer executable instructions that, when executing computer executable instructions, perform the following operations, namely S21 to S23:
S21,进行监控视频录制;S21, performing monitoring video recording;
S22,在视频录制的过程中,通过如上所述的视频标记方法对录制得到的视频文件进行事件标记;S22, in the process of video recording, performing event marking on the recorded video file by using a video marking method as described above;
S23,对视频文件完成一个事件标记后,将事件标记部分的视频文件进行告警显示。S23. After completing an event tag on the video file, the video file of the event tag part is displayed in an alarm.
以上所述仅为本发明的可选实施例和可选实施方式,并不用于限制本发明实施例的保护范围,对于本领域的技术人员来说,本发明实施例可以有各种更改和变化。凡在本发明实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明实施例的保护范围之内。The above is only an alternative embodiment and an optional embodiment of the present invention, and is not intended to limit the scope of protection of the embodiments of the present invention. For those skilled in the art, various changes and modifications may be made to the embodiments of the present invention. . Any modifications, equivalent substitutions, improvements, etc. within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(根据系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium on a corresponding hardware platform (according to The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
上述实施例中的装置/功能模块/功能单元可以采用通用的计算装置来实 现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。The device/function module/functional unit in the above embodiment can be implemented by using a general-purpose computing device. Now, they can be concentrated on a single computing device or distributed over a network of multiple computing devices.
上述实施例中的装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
工业实用性Industrial applicability
本发明实施例通过提取视频文件中音频信号的声音特征,将提取的声音特征与音频事件库中每个音频事件进行匹配,当提取的声音特征与至少一个音频事件匹配成功时,表明该视频文件中有该音频事件发生,在视频文件中对应位置对发生的该音频事件进行事件标记;其中,音频事件是预先基于事件发生时产生的音频信号的声音特征而建立的。本发明实施例可以通过预先设置音频事件,然后通过提取视频文件中音频信号的声音特征与每个音频事件进行匹配从而判断是否需要进行相应的标记,并不需要人工查看视频内容从而决定是否标记,对视频文件进行标记的效率和准确率都可以得到较大的提升。 In the embodiment of the present invention, the extracted sound feature is matched with each audio event in the audio event library by extracting the sound feature of the audio signal in the video file, and the video file is indicated when the extracted sound feature is successfully matched with the at least one audio event. The audio event occurs, and the audio event occurring in the video file is event-marked; wherein the audio event is established in advance based on the sound characteristics of the audio signal generated when the event occurs. In the embodiment of the present invention, by setting an audio event in advance, and then matching the sound feature of the audio signal in the video file with each audio event to determine whether the corresponding mark needs to be performed, it is not necessary to manually view the video content to determine whether to mark. The efficiency and accuracy of marking video files can be greatly improved.

Claims (14)

  1. 一种视频标记方法,包括:A video marking method, including:
    提取视频文件中音频信号的声音特征;Extracting sound characteristics of the audio signal in the video file;
    将提取的所述声音特征与音频事件库中每个音频事件进行匹配;所述每个音频事件基于事件发生时产生的音频信号的声音特征而建立;The extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
    当所述声音特征与至少一个所述音频事件匹配成功时,在所述视频文件中对应位置对发生的所述音频事件进行事件标记。When the sound feature is successfully matched with at least one of the audio events, an event flag is generated for the audio event that occurs at a corresponding location in the video file.
  2. 根据权利要求1所述的视频标记方法,其中,所述提取视频文件中音频信号的声音特征,包括:The video marking method according to claim 1, wherein the extracting the sound characteristics of the audio signal in the video file comprises:
    在视频录制过程中提取所述视频文件中音频信号的声音特征。The sound characteristics of the audio signal in the video file are extracted during video recording.
  3. 根据权利要求1所述的视频标记方法,其中,所述提取所述音频信号的声音特征,包括:The video marking method according to claim 1, wherein said extracting sound characteristics of said audio signal comprises:
    将所述音频信号变换到时频域,并提取所述音频信号的前景信号;Converting the audio signal to a time-frequency domain and extracting a foreground signal of the audio signal;
    所述将提取的所述声音特征与所述每个音频事件进行匹配,包括:The matching the extracted sound features with each of the audio events comprises:
    从所述前景信号中提取声音特征集合,计算所述声音特征集合与所述每个音频事件的相似度,得到的相似度大于设定的相似度阈值时,匹配成功。And extracting a sound feature set from the foreground signal, and calculating a similarity between the sound feature set and each of the audio events, and the obtained similarity is greater than a set similarity threshold, and the matching is successful.
  4. 根据权利要求1-3任一项所述的视频标记方法,其中,所述在所述视频文件中对应位置对发生的所述音频事件进行事件标记,包括进行以下标记中的一种或多种:The video marking method according to any one of claims 1 to 3, wherein the corresponding location in the video file performs event marking on the audio event that occurs, including performing one or more of the following markings. :
    在所述视频文件的关键视频帧位置标记所述音频事件发生的开始时间;Marking a start time of occurrence of the audio event at a key video frame position of the video file;
    获取并标记发生的所述音频事件中声源相对拾音器的方向信息和距离信息中的一项或多项;Acquiring and marking one or more of direction information and distance information of the sound source relative to the pickup in the audio event that occurs;
    获取并标记发生的所述音频事件对应的严重度级别;Obtaining and marking the severity level corresponding to the audio event that occurred;
    获取并标记发生的所述音频事件的名称。Gets and marks the name of the audio event that occurred.
  5. 根据权利要求4所述的视频标记方法,其中,所述获取发生的所述音频事件对应的严重度级别,包括:The video marking method according to claim 4, wherein the acquiring the severity level corresponding to the audio event that occurs includes:
    根据录制所述视频文件的位置信息和所述音频事件发生后的持续时间 中的一项或多项,确定所述音频事件对应的严重度级别。According to the location information of the recorded video file and the duration after the occurrence of the audio event One or more of the determinations of the severity level corresponding to the audio event.
  6. 根据权利要求5所述的视频标记方法,其中,所述在所述视频文件中对应位置对发生的所述音频事件进行事件标记,包括:The video tagging method according to claim 5, wherein the event tagging the audio event that occurs in the video file corresponding to the location includes:
    对所述音频事件对应的严重度级别进行标记时,还根据所述严重度级别与标记格式的对应关系表,选择与所述音频事件的严重度级别对应的标记格式进行标记。When the severity level corresponding to the audio event is marked, the marking format corresponding to the severity level of the audio event is selected according to the correspondence table of the severity level and the marking format.
  7. 根据权利要求1-3任一项所述的视频标记方法,其中,所述提取视频文件中音频信号的声音特征,包括:根据预设的检测周期提取所述视频文件中音频信号的声音特征;The video marking method according to any one of claims 1 to 3, wherein the extracting the sound feature of the audio signal in the video file comprises: extracting a sound feature of the audio signal in the video file according to a preset detection period;
    所述方法还包括:The method further includes:
    在相邻检测周期所提取的音频信号的声音特征都与至少一个所述音频事件匹配成功时,根据预设的事件合并规则判断是否对所述相邻两个检测周期内发生的音频事件进行合并;When the sound features of the audio signals extracted by the adjacent detection period are successfully matched with the at least one of the audio events, determining whether to merge the audio events occurring in the adjacent two detection periods according to a preset event combining rule ;
    当判断出进行合并时,将前一个检测周期内的音频事件的开始时间作为当前检测周期内音频事件的开始时间;When it is determined that the merging is performed, the start time of the audio event in the previous detection period is taken as the start time of the audio event in the current detection period;
    当判断出不进行合并时,设置所述前一个检测周期内的音频事件的结束时间和所述当前检测周期内的音频事件的开始时间为所述当前检测周期的开始时间。When it is determined that the merging is not performed, the end time of the audio event in the previous detection period and the start time of the audio event in the current detection period are set as the start time of the current detection period.
  8. 一种视频监控方法,包括:A video monitoring method includes:
    进行监控视频录制;Perform surveillance video recording;
    在所述视频录制的过程中,通过如权利要求1-7中任一项所述的视频标记方法对录制得到的视频文件进行事件标记;In the process of the video recording, the recorded video file is event-marked by the video marking method according to any one of claims 1-7;
    对所述视频文件完成一个事件标记后,将事件标记部分的视频文件进行告警显示。After an event tag is completed on the video file, the video file of the event tag portion is displayed as an alarm.
  9. 一种视频标记装置,包括:A video marking device comprising:
    特征提取模块,设置为:提取视频文件中音频信号的声音特征;a feature extraction module, configured to: extract a sound feature of the audio signal in the video file;
    处理模块,设置为:将所述特征提取模块提取的所述声音特征与音频事 件库中每个音频事件进行匹配;所述每个音频事件基于事件发生时产生的音频信号的声音特征而建立;a processing module, configured to: the sound feature and the audio event extracted by the feature extraction module Each audio event in the library is matched; each audio event is established based on the sound characteristics of the audio signal generated when the event occurs;
    标记模块,设置为:当所述处理模块的处理结果为所述声音特征与至少一个所述音频事件匹配成功时,在所述视频文件中对应位置对发生的所述音频事件进行事件标记。And a marking module, configured to: when the processing result of the processing module is that the sound feature is successfully matched with the at least one of the audio events, the corresponding event in the video file performs an event tag on the audio event that occurs.
  10. 根据权利要求9所述的视频标记装置,还包括:The video marking device of claim 9 further comprising:
    视频录制模块,设置为:进行视频录制;The video recording module is set to: perform video recording;
    所述特征提取模块,设置为:在所述视频录制模块进行视频录制过程中,提取所述视频文件中音频信号的声音特征。The feature extraction module is configured to: extract a sound feature of an audio signal in the video file during video recording by the video recording module.
  11. 根据权利要求9或10所述的视频标记装置,其中,所述标记模块在所述视频文件中对应位置对发生的所述音频事件进行事件标记,包括进行以下标记中的一种或多种:The video tagging apparatus according to claim 9 or 10, wherein said tagging module performs event tagging on said audio event occurring at a corresponding position in said video file, including performing one or more of the following:
    在所述视频文件的关键视频帧位置标记所述音频事件发生的开始时间;Marking a start time of occurrence of the audio event at a key video frame position of the video file;
    获取并标记发生的所述音频事件中声源相对拾音器的方向信息和距离信息中的一项或多项;Acquiring and marking one or more of direction information and distance information of the sound source relative to the pickup in the audio event that occurs;
    获取并标记发生的所述音频事件对应的严重度级别;Obtaining and marking the severity level corresponding to the audio event that occurred;
    获取并标记发生的所述音频事件的名称。Gets and marks the name of the audio event that occurred.
  12. 一种视频监控系统,包括:监测处理装置和如权利要求9-11任一项所述的视频标记装置;A video surveillance system comprising: a monitoring processing device and the video marking device of any of claims 9-11;
    所述视频标记装置,设置为:对视频监控过程中录制的视频文件进行事件标记,并在对所述视频文件完成一个事件标记后,向所述监测处理装置告警;The video tagging device is configured to: perform an event tag on the video file recorded during the video monitoring process, and notify the monitoring processing device after completing an event tag on the video file;
    所述监测处理装置,设置为:在收到所述视频标记装置的告警后,将所述事件标记部分的视频文件进行告警显示。The monitoring processing device is configured to: after receiving the alarm of the video marking device, perform an alarm display on the video file of the event marking portion.
  13. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述处理器执行所述计算机可执行指令时,进行如下操作:A computer readable storage medium storing computer executable instructions, and when the processor executes the computer executable instructions, performing the following operations:
    提取视频文件中音频信号的声音特征; Extracting sound characteristics of the audio signal in the video file;
    将提取的所述声音特征与音频事件库中每个音频事件进行匹配;所述每个音频事件基于事件发生时产生的音频信号的声音特征而建立;The extracted sound features are matched to each audio event in the audio event library; each of the audio events is established based on a sound characteristic of the audio signal generated at the time of the event;
    当所述声音特征与至少一个所述音频事件匹配成功时,在所述视频文件中对应位置对发生的所述音频事件进行事件标记。When the sound feature is successfully matched with at least one of the audio events, an event flag is generated for the audio event that occurs at a corresponding location in the video file.
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述处理器执行所述计算机可执行指令时,进行如下操作:A computer readable storage medium storing computer executable instructions, and when the processor executes the computer executable instructions, performing the following operations:
    进行监控视频录制;Perform surveillance video recording;
    在所述视频录制的过程中,通过如权利要求1-7中任一项所述的视频标记方法对录制得到的视频文件进行事件标记;In the process of the video recording, the recorded video file is event-marked by the video marking method according to any one of claims 1-7;
    对所述视频文件完成一个事件标记后,将事件标记部分的视频文件进行告警显示。 After an event tag is completed on the video file, the video file of the event tag portion is displayed as an alarm.
PCT/CN2017/086325 2016-06-08 2017-05-27 Video marking method and device, and video monitoring method and system WO2017211206A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610405207.6A CN107483879B (en) 2016-06-08 2016-06-08 Video marking method and device and video monitoring method and system
CN201610405207.6 2016-06-08

Publications (1)

Publication Number Publication Date
WO2017211206A1 true WO2017211206A1 (en) 2017-12-14

Family

ID=60577623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086325 WO2017211206A1 (en) 2016-06-08 2017-05-27 Video marking method and device, and video monitoring method and system

Country Status (2)

Country Link
CN (1) CN107483879B (en)
WO (1) WO2017211206A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992984A (en) * 2019-12-02 2020-04-10 新华智云科技有限公司 Audio processing method and device and storage medium
CN111950332A (en) * 2019-05-17 2020-11-17 杭州海康威视数字技术股份有限公司 Video time sequence positioning method and device, computing equipment and storage medium
CN113038265A (en) * 2021-03-01 2021-06-25 创新奇智(北京)科技有限公司 Video annotation processing method and device, electronic equipment and storage medium
CN113435433A (en) * 2021-08-30 2021-09-24 广东电网有限责任公司中山供电局 Audio and video data extraction processing system based on operation site
CN114363660A (en) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 Video collection determining method and device, electronic equipment and storage medium
US11722763B2 (en) 2021-08-06 2023-08-08 Motorola Solutions, Inc. System and method for audio tagging of an object of interest

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11074292B2 (en) * 2017-12-29 2021-07-27 Realwear, Inc. Voice tagging of video while recording
CN109246467A (en) * 2018-08-15 2019-01-18 上海蔚来汽车有限公司 Label is to the method, apparatus of sharing video frequency, video camera and smart phone
CN112513800B (en) * 2018-08-22 2024-03-12 深圳市欢太科技有限公司 Shorthand method and device, terminal and storage medium
CN112513799A (en) * 2018-08-22 2021-03-16 深圳市欢太科技有限公司 Shorthand method and device, terminal and storage medium
CN109121022B (en) * 2018-09-28 2020-05-05 百度在线网络技术(北京)有限公司 Method and apparatus for marking video segments
CN109640112B (en) * 2019-01-15 2021-11-23 广州虎牙信息科技有限公司 Video processing method, device, equipment and storage medium
CN110083085A (en) * 2019-03-15 2019-08-02 杭州钱袋金融信息服务有限公司 A kind of double recording systems of finance with phonetic symbol function
CN110223715B (en) * 2019-05-07 2021-05-25 华南理工大学 Home activity estimation method for solitary old people based on sound event detection
CN110211319B (en) * 2019-06-05 2021-05-14 深圳市梦网视讯有限公司 Security monitoring early warning event tracking method and system
CN110942766A (en) * 2019-11-29 2020-03-31 厦门快商通科技股份有限公司 Audio event detection method, system, mobile terminal and storage medium
CN111327855B (en) * 2020-03-10 2022-08-05 网易(杭州)网络有限公司 Video recording method and device and video positioning method and device
CN112116328B (en) * 2020-09-25 2023-12-19 维沃移动通信有限公司 Reminding method and device and electronic equipment
CN112420077B (en) * 2020-11-19 2022-08-16 展讯通信(上海)有限公司 Sound positioning method and device, testing method and system, equipment and storage medium
WO2022265629A1 (en) * 2021-06-16 2022-12-22 Hewlett-Packard Development Company, L.P. Audio signal quality scores
CN113593619B (en) * 2021-07-30 2022-08-09 北京百度网讯科技有限公司 Method, apparatus, device and medium for recording audio
CN113573136B (en) * 2021-09-23 2021-12-07 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN114979745A (en) * 2022-05-06 2022-08-30 维沃移动通信有限公司 Video processing method and device, electronic equipment and readable storage medium
CN115129927B (en) * 2022-08-17 2023-05-02 广东龙眼数字科技有限公司 Monitoring video stream backtracking method, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819770A (en) * 2010-01-27 2010-09-01 武汉大学 System and method for detecting audio event
WO2011025085A1 (en) * 2009-08-25 2011-03-03 Axium Technologies, Inc. Method and system for combined audio-visual surveillance cross-reference to related applications
CN102044242A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method, device and electronic equipment for voice activity detection
CN102176746A (en) * 2009-09-17 2011-09-07 广东中大讯通信息有限公司 Intelligent monitoring system used for safe access of local cell region and realization method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011025085A1 (en) * 2009-08-25 2011-03-03 Axium Technologies, Inc. Method and system for combined audio-visual surveillance cross-reference to related applications
CN102176746A (en) * 2009-09-17 2011-09-07 广东中大讯通信息有限公司 Intelligent monitoring system used for safe access of local cell region and realization method thereof
CN102044242A (en) * 2009-10-15 2011-05-04 华为技术有限公司 Method, device and electronic equipment for voice activity detection
CN101819770A (en) * 2010-01-27 2010-09-01 武汉大学 System and method for detecting audio event

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950332A (en) * 2019-05-17 2020-11-17 杭州海康威视数字技术股份有限公司 Video time sequence positioning method and device, computing equipment and storage medium
CN111950332B (en) * 2019-05-17 2023-09-05 杭州海康威视数字技术股份有限公司 Video time sequence positioning method, device, computing equipment and storage medium
CN110992984A (en) * 2019-12-02 2020-04-10 新华智云科技有限公司 Audio processing method and device and storage medium
CN110992984B (en) * 2019-12-02 2022-12-06 新华智云科技有限公司 Audio processing method and device and storage medium
CN113038265A (en) * 2021-03-01 2021-06-25 创新奇智(北京)科技有限公司 Video annotation processing method and device, electronic equipment and storage medium
CN113038265B (en) * 2021-03-01 2022-09-20 创新奇智(北京)科技有限公司 Video annotation processing method and device, electronic equipment and storage medium
US11722763B2 (en) 2021-08-06 2023-08-08 Motorola Solutions, Inc. System and method for audio tagging of an object of interest
CN113435433A (en) * 2021-08-30 2021-09-24 广东电网有限责任公司中山供电局 Audio and video data extraction processing system based on operation site
CN114363660A (en) * 2021-12-24 2022-04-15 腾讯科技(武汉)有限公司 Video collection determining method and device, electronic equipment and storage medium
CN114363660B (en) * 2021-12-24 2023-09-08 腾讯科技(武汉)有限公司 Video collection determining method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107483879A (en) 2017-12-15
CN107483879B (en) 2020-06-09

Similar Documents

Publication Publication Date Title
WO2017211206A1 (en) Video marking method and device, and video monitoring method and system
CN109753920B (en) Pedestrian identification method and device
CN109040824B (en) Video processing method and device, electronic equipment and readable storage medium
WO2021164644A1 (en) Violation event detection method and apparatus, electronic device, and storage medium
US10141025B2 (en) Method, device and computer-readable medium for adjusting video playing progress
TW202105199A (en) Data update method, electronic device and storage medium thereof
CN105845124B (en) Audio processing method and device
KR20200116158A (en) Image processing method and device, electronic device and storage medium
WO2021093427A1 (en) Visitor information management method and apparatus, electronic device, and storage medium
CN111814629A (en) Person detection method and device, electronic device and storage medium
US10805576B2 (en) Systems and methods for generating an audit trail for auditable devices
CN110688527A (en) Video recommendation method and device, storage medium and electronic equipment
TW202109360A (en) Image processing method and device, electronic equipment and storage medium
TWI724546B (en) Intelligent alarm method, device and system including the same for mobile phone
CN108322350B (en) Service monitoring method and device and electronic equipment
CN106453528A (en) Method and device for pushing message
WO2023155484A1 (en) Anti-removal target device
CN110392304A (en) A kind of video display method, apparatus, electronic equipment and storage medium
CN111814627B (en) Person detection method and device, electronic device and storage medium
JP6214762B2 (en) Image search system, search screen display method
CN104050785A (en) Safety alert method based on virtualized boundary and face recognition technology
CN113722541A (en) Video fingerprint generation method and device, electronic equipment and storage medium
US20210132855A1 (en) Method and device for detecting slow node and computer-readable storage medium
WO2012146273A1 (en) Method and system for video marker insertion
CN114762320A (en) Video highlight screen recording method and device and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17809645

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17809645

Country of ref document: EP

Kind code of ref document: A1