CN113536838A

CN113536838A - Method and device for outputting audio signal of camera, storage medium and equipment

Info

Publication number: CN113536838A
Application number: CN202010293785.1A
Authority: CN
Inventors: 徐晓军
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2021-10-22

Abstract

The embodiment of the application discloses a method and a device for outputting an audio signal of a camera, a storage medium and equipment. The camera comprises at least two audio acquisition devices, and the method comprises the following steps: acquiring a video image shot by a camera, and determining whether monitoring personnel exist in the video image; if yes, determining the area of the monitoring personnel in the monitoring range; determining the audio acquisition devices corresponding to the areas where the monitoring personnel are located according to the preset mapping relation between each area and the audio acquisition device; and if the selection operation of the target monitoring personnel is detected, determining a target audio acquisition device associated with the area where the target monitoring personnel is located, and taking the audio signal of the target audio acquisition device as an output audio signal. By implementing the technical scheme, the effect of improving the output quality of the audio signal of the camera can be realized by selecting the form of the audio signal source under the condition of multiple microphones.

Description

Method and device for outputting audio signal of camera, storage medium and equipment

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to a method, a device, a storage medium and equipment for outputting an audio signal of a camera.

Background

In daily life of people, the video is more visual as a transmission carrier of information, and people can listen to sound in the video by watching the video to acquire a lot of information. However, in the prior art, only the video effect of the camera, the definition of the image, the video code rate, the propagation speed and the like are often paid attention to. The audio quality of the video is not paid much attention, which results in that the audio quality of the video obtained by the camera is generally not high.

Disclosure of Invention

The embodiment of the application provides an output method, an output device, a storage medium and equipment of an audio signal of a camera, and the effect of improving the output quality of the audio signal of the camera can be realized by selecting the form of an audio signal source under the condition of multiple microphones.

In a first aspect, an embodiment of the present application provides an output method of an audio signal of a camera, where the camera includes at least two audio acquisition devices, the method includes:

acquiring a video image shot by a camera, and determining whether monitoring personnel exist in the video image;

if yes, determining the area of the monitoring personnel in the monitoring range;

determining the audio acquisition devices corresponding to the areas where the monitoring personnel are located according to the preset mapping relation between each area and the audio acquisition device;

and if the selection operation of the target monitoring personnel is detected, determining a target audio acquisition device associated with the area where the target monitoring personnel is located, and taking the audio signal of the target audio acquisition device as an output audio signal.

Further, after determining the audio acquiring device corresponding to the area where the monitoring person is located, the method further includes:

and performing associated storage on the monitoring personnel, the area of the monitoring personnel and the corresponding audio acquisition device according to a preset format.

Further, before determining the audio acquiring device corresponding to the region where the monitoring person is located according to the preset mapping relationship between each region and the audio acquiring device, the method further includes:

determining the number of monitoring personnel in the video image;

and if the number of the monitoring personnel is at least two, determining the basic characteristics of each monitoring personnel to number each monitoring personnel respectively.

Further, determining the basic features of each monitoring person to number each monitoring person respectively includes:

acquiring a face image of each monitoring person;

constructing basic characteristics of each monitoring person according to the face image;

and numbering the monitoring personnel respectively according to the basic characteristics.

Further, before acquiring a video image captured by a camera and determining whether a monitoring person is present in the video image, the method further includes:

and determining the mapping relation between each area in the monitoring range of the video image and the audio acquisition devices according to the number and the relative positions of the audio acquisition devices of the camera.

Further, after determining a target audio acquiring device associated with an area where a target monitoring person is located and taking an audio signal of the target audio acquiring device as an output audio signal, the method further includes:

and if the change operation of the target monitoring personnel is detected, determining a change audio acquisition device associated with the area where the target monitoring personnel is located after the change, and taking the audio signal of the change audio acquisition device as an output audio signal.

In a second aspect, an embodiment of the present application provides an apparatus for outputting an audio signal of a camera, where the camera includes at least two audio acquisition apparatuses, and the apparatus includes:

the monitoring personnel identification module is used for acquiring a video image shot by a camera and determining whether monitoring personnel exist in the video image;

the location area determining module is used for determining the location area of the monitoring personnel in the monitoring range if the monitoring personnel exist;

the audio acquisition device corresponding module is used for determining the audio acquisition devices corresponding to the areas where the monitoring personnel are located according to the preset mapping relation between each area and the audio acquisition device;

and the output audio signal determining module is used for determining a target audio acquiring device associated with the area where the target monitoring person is located if the selection operation of the target monitoring person is detected, and taking the audio signal of the target audio acquiring device as the output audio signal.

Further, the apparatus further comprises:

and the associated storage module is used for performing associated storage on the monitoring personnel, the area where the monitoring personnel are located and the corresponding audio acquisition device according to a preset format.

In a third aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for outputting an audio signal of a camera according to the embodiments of the present application.

In a fourth aspect, the present application provides an apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for outputting the camera audio signal according to the embodiment of the present application when executing the computer program.

According to the technical scheme provided by the embodiment of the application, a video image shot by a camera is obtained, and whether monitoring personnel exist in the video image is determined; if yes, determining the area of the monitoring personnel in the monitoring range; determining the audio acquisition devices corresponding to the areas where the monitoring personnel are located according to the preset mapping relation between each area and the audio acquisition device; and if the selection operation of the target monitoring personnel is detected, determining a target audio acquisition device associated with the area where the target monitoring personnel is located, and taking the audio signal of the target audio acquisition device as an output audio signal. By adopting the technical scheme provided by the application, the effect of improving the output quality of the audio signal of the camera can be realized by selecting the form of the audio signal source under the condition of multiple microphones.

Drawings

Fig. 1 is a flowchart of an output method of a camera audio signal according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a camera provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a video image provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of an output device for an audio signal of a camera according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Fig. 1 is a flowchart of an output method of an audio signal of a camera according to an embodiment of the present disclosure, where the present disclosure is applicable to a situation where a video is played after a video is recorded by a camera, and the method may be executed by an output apparatus of an audio signal of a camera according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be integrated in a device such as an intelligent terminal for playing a video.

As shown in fig. 1, the method for outputting the camera audio signal includes:

s110, acquiring a video image shot by a camera, and determining whether monitoring personnel exist in the video image.

The camera may be a camera installed at a fixed position, for example, a camera for security protection, or may be a camera that can be moved, for example, held by a user or fixed on a vehicle, to capture a video. The shot images of the video can be a certain number of continuous images, or images of extracting one frame at intervals of a certain number of frames.

In this embodiment, whether the monitoring person exists in the video image may be determined through image recognition, feature comparison, and other techniques. For example, a camera installed at the doorway of a cell, there may be a certain time when no person passes, and there may be 1, 2 or more persons passing at the same time. In each case, the front end of the camera or the back end for playing the video may employ corresponding techniques to identify whether monitoring personnel are present in the video image, and the number of monitoring personnel present.

In this embodiment, the camera includes at least two audio capturing devices, the distance between the camera and the audio capturing devices may be determined according to the setting positions of the audio capturing devices, for example, a microphone may be respectively disposed at 0.5 meter on the left and right sides of the camera. Four microphones can also be arranged at the positions of 0.3 meter from top to bottom, left to right. The arrangement position of the audio acquisition device, that is, the microphone, may be preset according to the requirement of the user.

In this embodiment, optionally, before acquiring a video image captured by a camera and determining whether a monitoring person exists in the video image, the method further includes: and determining the mapping relation between each area in the monitoring range of the video image and the audio acquisition devices according to the number and the relative positions of the audio acquisition devices of the camera.

With reference to the above example, if the number of the audio capturing devices is 2, and the relative positions are 0.5 meters each disposed at the left and right of the camera lens, it may be determined that the monitoring range of the video image includes two regions, and the two regions are obtained by vertically splitting the middle position of an image, where the left region corresponds to the left audio capturing device, and the right region corresponds to the right audio capturing device. Through the arrangement, the binding relationship can be formed between each region in the monitoring range and the audio acquisition device for acquiring the audio in the region more clearly. For example, when monitoring personnel exist in the left half area in the monitoring range, and when the monitoring personnel make a sound, the audio acquisition device arranged on the left side can acquire audio information more clearly. According to the scheme, through the establishment of the mapping relation, a data basis can be provided for subsequent output audio signals, the output audio signals are ensured to be more targeted, and the audio quality of the shot video is improved.

And S120, if the monitoring personnel exist, determining the area of the monitoring personnel in the monitoring range.

The areas may be divided in advance according to the set number and the set position of the audio acquisition devices, or may be obtained directly by a worker according to a predetermined association relationship. Because the size of a person in a video image is influenced by a perspective principle, if the distance is short, the image of the person in the monitored range is larger, and therefore the position of the monitored person can be determined more specifically according to the position of the face.

And S130, determining the audio acquisition devices corresponding to the areas where the monitoring personnel are located according to the preset mapping relation between each area and the audio acquisition device.

After the area where the monitoring person is located is determined, the audio acquisition device corresponding to the area where the monitoring person is located can be determined according to the preset mapping relationship between each area and the audio acquisition device. For example, if the monitoring person is currently on the left side of the video image, it is determined that the audio acquiring device corresponding to the area is the audio acquiring device arranged on the left side.

In this embodiment, optionally, before determining the audio acquiring device corresponding to the area where the monitoring person is located according to the preset mapping relationship between each area and the audio acquiring device, the method further includes:

determining the number of monitoring personnel in the video image;

In this case, the number of each monitoring person may be determined according to the basic features of each monitoring person, for example, the difference of the facial features of each monitoring person, and the area where the monitoring person is located and the corresponding audio acquisition device are determined for the monitoring persons with different numbers. According to the scheme, the audio acquisition devices can be directly and respectively determined according to the conditions of a plurality of monitoring personnel, so that the source of the actually adopted audio signal is determined, and the effect of clear audio which can be output by each monitoring personnel in the monitoring range is achieved.

S140, if the selection operation of the target monitoring personnel is detected, determining a target audio acquisition device associated with the area where the target monitoring personnel is located, and taking the audio signal of the target audio acquisition device as an output audio signal.

The face information of the monitoring person can be selected from a Web interface, an application program or an applet, and the selection operation of the target monitoring person can be determined to be detected. According to the face information selected by the user, the area of the monitoring person in the video image can be determined, and further, the target audio acquisition device associated with the area can be determined.

After determining the target audio capturing device, for example, the microphone on the left side of the camera, the audio signal of the target audio capturing device may be used as the output audio signal. That is, after the user clicks the monitoring target, the microphone of the audio output may be determined according to the position of the monitoring target. Through the arrangement, the audio acquisition device resource of the camera can be fully utilized, and the effect of improving the audio quality of the shot video is realized.

Fig. 2 is a schematic structural diagram of a camera provided in an embodiment of the present application. As shown in fig. 2, the camera may be a lens dedicated to fish eyes, and may implement 180 ° panoramic monitoring. When multiple microphones are used to pick up the sound, including but not limited to a two-microphone solution. Here, a two-microphone example is described, and the microphone 1 and the microphone 2 are respectively disposed on the left and right sides of the camera lens and are disposed in a horizontal direction. The division is shown by a dashed line, and the division is performed into two regions at the optical center of the SENSOR photosurface, for example, region 1 and region 2, respectively, where region 1 is closer to microphone 1 and region 2 is closer to microphone 2.

Fig. 3 is a schematic diagram of a video image provided by an embodiment of the present application. As shown in fig. 3, if there is a monitoring person number one in the area 1 and a monitoring person number two in the area 2, the numbers of the monitoring persons can be determined respectively when the video data is stored, and it can be determined from which microphone the actually played audio information comes from according to the areas where the monitoring persons are located when the video data is played.

In each of the above technical solutions, optionally, determining the basic features of each monitoring person to number each monitoring person respectively includes: acquiring a face image of each monitoring person; constructing basic characteristics of each monitoring person according to the face image; and numbering the monitoring personnel respectively according to the basic characteristics.

The image recognition technology can be adopted, the basic characteristics of the face images of the monitoring personnel are constructed, different monitoring personnel are obtained, and the different monitoring personnel are numbered. The numbering order may be incremental, and may be updated at regular intervals, for example, numbering from 1 at 0 a day. Through the arrangement, the complexity of information storage is facilitated to be simplified, the operation of a user watching the video can be simplified, excessive determination of each face characteristic is not needed, and the target monitoring personnel can be determined directly according to the number.

In each of the above technical solutions, optionally, after determining the audio acquiring apparatus corresponding to the area where the monitoring person is located, the method further includes: and performing associated storage on the monitoring personnel, the area of the monitoring personnel and the corresponding audio acquisition device according to a preset format.

The preset format may be in the form of an association table, the area 1 and the area 2 are divided as described above, and the first monitoring person and the second monitoring person who mark entering the monitoring range may be associated and stored in the following table manner:

through the arrangement, the located area of each monitoring person and the corresponding microphone number can be clearly determined. Therefore, the microphone which records the sound information of the monitored target optimally can be embodied according to one table, and the audio quality of the recorded video is improved.

In each of the above technical solutions, optionally, after determining a target audio acquiring apparatus associated with an area where a target monitoring person is located, and taking an audio signal of the target audio acquiring apparatus as an output audio signal, the method further includes: and if the change operation of the target monitoring personnel is detected, determining a change audio acquisition device associated with the area where the target monitoring personnel is located after the change, and taking the audio signal of the change audio acquisition device as an output audio signal.

When a plurality of monitoring personnel appear in the monitoring range of the video, the numbers of the monitoring personnel can be displayed on the side face of the video display interface, a user can select and change target monitoring personnel in the playing process, for example, the monitoring personnel 1 becomes the monitoring personnel 2, the switching can be realized by clicking the numbers of the monitoring personnel 2, and then the changed audio acquisition device can be determined according to the area where the monitoring personnel 2 are located. It is understood that if the monitoring person 2 and the monitoring person 1 are in the same area, the modified audio capture device may be the same as the original audio capture device. According to the scheme, the optimal audio data can be provided for different targets to be output according to the requirements of the user in the video display process.

Fig. 4 is a schematic structural diagram of an output device for an audio signal of a camera according to an embodiment of the present application. As shown in fig. 4, the device for outputting an audio signal of a camera, the camera including at least two audio acquisition devices, includes:

a monitoring person identification module 410, configured to acquire a video image captured by a camera, and determine whether a monitoring person exists in the video image;

a location area determining module 420, configured to determine, if there is a monitoring person, a location area of the monitoring person within the monitoring range;

the audio acquiring device corresponding module 430 is configured to determine, according to a preset mapping relationship between each region and an audio acquiring device, an audio acquiring device corresponding to a region where a monitoring person is located;

the output audio signal determining module 440 is configured to determine, if a selection operation of a target monitoring person is detected, a target audio acquiring device associated with an area where the target monitoring person is located, and use an audio signal of the target audio acquiring device as an output audio signal.

Optionally, the apparatus further comprises:

According to the technical scheme provided by the embodiment of the application, the effect of improving the output quality of the audio signal of the camera can be realized by selecting the form of the audio signal source under the condition of multiple microphones.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.

Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method of outputting camera audio signals, the method including:

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide the program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the output operation of the camera audio signal as described above, and may also perform related operations in the output method of the camera audio signal provided in any embodiments of the present application.

The embodiment of the application provides equipment, and the equipment can be integrated with the output device of the camera audio signal provided by the embodiment of the application. Fig. 5 is a schematic structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 5, the present embodiment provides an apparatus 500 comprising: one or more processors 520; a storage 510, configured to store one or more programs, when the one or more programs are executed by the one or more processors 520, so that the one or more processors 520 implement the method for outputting the camera audio signal provided in the embodiment of the present application, the method includes:

Of course, those skilled in the art will understand that the processor 520 also implements the technical solution of the output method of the camera audio signal provided in any embodiment of the present application.

The apparatus 500 shown in fig. 5 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present application.

As shown in fig. 5, the apparatus 500 includes a processor 520, a storage device 510, an input device 530, and an output device 540; the number of the processors 520 in the device may be one or more, and one processor 520 is taken as an example in fig. 5; the processor 520, the memory device 510, the input device 530 and the output device 540 of the apparatus may be connected by a bus or other means, such as by a bus 550 in fig. 5.

The storage device 510 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the output method of the camera audio signal in the embodiment of the present application.

The storage device 510 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 510 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 510 may further include memory located remotely from processor 520, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 530 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and function control of the apparatus. The output device 540 may include a display screen, speakers, etc.

The device provided by the embodiment of the application can realize the effect of improving the output quality of the audio signal of the camera by selecting the form of the audio signal source under the condition of multiple microphones.

The output device, the storage medium and the device for the camera audio signal provided in the above embodiments can execute the output method for the camera audio signal provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. Technical details not described in detail in the above embodiments may be referred to an output method of a camera audio signal provided in any embodiment of the present application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A method of outputting an audio signal from a camera, wherein the camera comprises at least two audio acquisition devices, the method comprising:

2. The method of claim 1, wherein after determining the audio capture device corresponding to the region where the monitoring personnel is located, the method further comprises:

3. The method according to claim 1, wherein before determining the audio acquiring device corresponding to the region where the monitoring person is located according to the preset mapping relationship between each region and the audio acquiring device, the method further comprises:

determining the number of monitoring personnel in the video image;

4. The method of claim 1, wherein determining the base characteristics of each monitoring person to number each monitoring person separately comprises:

acquiring a face image of each monitoring person;

5. The method of claim 1, wherein prior to acquiring a video image captured by a camera and determining whether a monitoring person is present in the video image, the method further comprises:

6. The method of claim 1, wherein after determining a target audio capturing device associated with an area where a target monitoring person is located and taking an audio signal of the target audio capturing device as an output audio signal, the method further comprises:

7. An output device for an audio signal of a camera, wherein the camera comprises at least two audio acquisition devices, the device comprising:

8. The apparatus of claim 7, further comprising:

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of outputting the camera audio signal according to any one of claims 1 to 6.

10. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of outputting a camera audio signal according to any one of claims 1 to 6 when executing the computer program.