CN117674880A

CN117674880A - Directional sound channel selection method, electronic device, medium and vehicle

Info

Publication number: CN117674880A
Application number: CN202211039768.0A
Authority: CN
Inventors: 陈瑞
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2024-03-08
Also published as: WO2024045616A1

Abstract

The present disclosure provides a method for selecting a directional channel, including: acquiring an image to be recognized reflecting an object in a preset angle range around a target object; determining the position information of a target person in the image to be identified; and taking the directional channel with the sound propagation direction matched with the position information of the target person as a target directional channel. The present disclosure also provides an electronic device, a storage medium, and a vehicle.

Description

Directional sound channel selection method, electronic device, medium and vehicle

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to a method of selecting a directional channel, an electronic device, a computer-readable storage medium, and a vehicle.

Background

Vehicles have become a major tool for assisting in walking, and with the emphasis on public health safety, the number of windows opened should be reduced as much as possible. When a person in the vehicle needs to talk with a person outside the vehicle, the vehicle window is inevitably required to be opened, and various potential safety hazards are brought when the vehicle window is opened.

Therefore, how to eliminate the potential safety hazard when the personnel in the vehicle converses with the personnel outside the vehicle is a technical problem to be solved in the field.

Disclosure of Invention

Embodiments of the present disclosure provide a method of selecting a directional channel, an electronic device, a computer-readable storage medium, and a vehicle.

As a first aspect of the present disclosure, there is provided a method of selecting a directional channel, including:

acquiring an image to be recognized reflecting an object in a preset angle range around a target object;

determining the position information of a target person in the image to be identified;

and taking the directional channel with the sound propagation direction matched with the position information of the target person as a target directional channel.

Optionally, the position information of the target person includes coordinates of feature pixels composing a portrait of the target person in the image to be recognized;

the target directional channel which is a directional channel with the sound propagation direction matched with the position information of the target person comprises:

determining the identification information of the directional channel corresponding to the coordinates of the feature pixels according to a mapping table, wherein the mapping table comprises a mapping relation between a pixel coordinate range and the identification information of the directional channel;

and taking the directional channel corresponding to the identification information of the directional channel corresponding to the characteristic pixel as the target directional channel.

Optionally, the feature pixel of the person image of the target person is a pixel of a center point of a face in the person image of the target person.

Optionally, the image to be identified includes partial images shot by a plurality of cameras, and the mapping relationship in the mapping table further includes mapping relationship between identification information of the cameras, the pixel coordinate range and identification information of the directional channel.

Optionally, the determining the location information of the target person in the image to be identified includes:

identifying a portrait in the image to be identified;

determining the distance between the person corresponding to the portrait and the target object;

taking a person whose distance from the target object does not exceed a predetermined distance threshold as the target person;

and determining the position information of the target person.

Optionally, in the step of determining the distance between the person corresponding to the portrait and the target object, the distance between the person corresponding to the portrait and the target object is calculated using the following formula:

D＝(Wf*F)/Wp；

wherein D is the distance between the person corresponding to the portrait and the target object;

wf is the face width;

f is the focal length of a camera shooting the portrait;

wp is the face pixel width.

Optionally, before the capturing the image to be identified around the target object, the selecting method further includes:

receiving a video stream;

and taking each video frame in the video stream as the image to be identified.

Optionally, the predetermined angle is 360 °.

As a second aspect of the present disclosure, there is provided an electronic apparatus including:

a memory having an executable program stored thereon;

one or more processors, when the one or more processors call the executable program, are capable of implementing the selection method provided in the first aspect of the present disclosure.

As a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon an executable program that, when called, is capable of implementing the selection method provided in the first aspect of the present disclosure.

As a fourth aspect of the present disclosure, there is provided a vehicle including a vehicle body, an electronic device, an image pickup apparatus, and a plurality of intercom devices, wherein the electronic device is the electronic device provided in the second aspect of the present disclosure, the vehicle body is the target object, the image pickup apparatus is configured to acquire the image to be recognized, and the plurality of intercom devices respectively correspond to a plurality of directional channels having different sound propagation directions.

Optionally, the intercom equipment includes the intercom host computer and with intercom auxiliary engine that talkbacks the host computer assorted, the intercom host computer sets up inside the vehicle body, the intercom auxiliary engine sets up on the surface of vehicle body.

In the selection method provided by the disclosure, after an image to be identified in a predetermined angle range around a target object is obtained, the image to be identified is identified, and after a target person is identified, a directional channel in a corresponding azimuth is used as a target directional channel. Then, the intercom equipment corresponding to the target directional channel is controlled to be opened, so that the person in the target object can be communicated with the person outside the target object under the condition that the person in the target object is in a closed state (for example, a window is not opened).

Drawings

FIG. 1 is a flow chart of one embodiment of a method of selecting a directional channel provided by the present disclosure;

FIG. 2 is a flow chart of one embodiment of step S130;

FIG. 3 is a flow chart of one embodiment of step S120;

FIG. 4 is a flow chart of another embodiment of a method of selecting a directional channel provided by the present disclosure;

FIG. 5 is a schematic diagram showing correspondence between directional channels and regions of an image to be identified;

fig. 6 is a schematic diagram showing the turning on of one intercom device and the turning off of the other intercom device of the vehicle.

Detailed Description

In order to better understand the technical solutions of the present disclosure, the following describes in detail a control method of a directional channel, an electronic device, a computer readable storage medium, and a vehicle provided by the present disclosure with reference to the accompanying drawings.

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As a first aspect of the present disclosure, there is provided a method of selecting a directional channel, as shown in fig. 1, the method comprising:

in step S110, an image to be recognized reflecting an object within a predetermined angle range around the target object is acquired;

in step S120, determining position information of a target person in the image to be identified;

in step S130, a target directional channel is set as a directional channel whose sound propagation direction matches the position information of the target person.

And a plurality of intercom devices are arranged on the target object, and the intercom devices respectively correspond to a plurality of directional sound channels with different sound propagation directions. The target object may be a vehicle or other security-requiring device (e.g., a counter shop window).

In the present disclosure, the predetermined angle range is not particularly limited, and may be determined according to a specific type of the target object and a specific scene in which the target object is located. As an alternative embodiment, the predetermined angle may be 360 °, so that the image to be recognized may be a panoramic image around the target object.

Of course, the present disclosure is not limited thereto, and the predetermined angle may not exceed 180 ° when the target object is a bank counter showcase, for example.

After the image to be identified in the preset angle range around the target object is obtained, the image to be identified is identified, and after the target person is identified, the orientation sound channel in the corresponding direction is used as the target orientation sound channel. Then, the intercom equipment corresponding to the target directional channel is controlled to be opened, so that the person in the target object can be communicated with the person outside the target object under the condition that the person in the target object is in a closed state (for example, a window is not opened).

In the present disclosure, whether a target portrait exists in the panoramic image may be identified through an artificial intelligence (AI, artificial Intelligence) technique. For example, whether or not a target image is present in the panoramic image may be recognized by inputting an image to be recognized into a deep learning neural network.

It should be noted that, the "image to be identified" herein may be a panoramic image captured by a panoramic camera and capable of reflecting a surrounding target object, or may be a combination of a plurality of sub-images captured by cameras facing different directions.

In the present disclosure, no particular limitation is made on how the position information of the target person is. As an alternative embodiment, the location information of the target person may be a relative location of the target person with respect to the target object.

For example, a reference line may be set on the target object. The relative position of the target person with respect to the target object may be an angle through which the target person rotates with respect to the reference line. The sound propagation direction of each directional channel covers an angle. After determining the angle rotated by the target person relative to the reference line, the target directional channel may be determined based on the angle covered by each directional channel.

Of course, the present disclosure is not limited thereto. As another alternative embodiment, the position information of the target person includes coordinates of feature pixels constituting a portrait of the target person in the image to be recognized.

Accordingly, as shown in fig. 2, the target directional channel, which is a directional channel in which the sound propagation direction matches with the position information of the target person, includes:

in step S131, determining identification information of the directional channel corresponding to the feature coordinates according to a mapping table, where the mapping table includes a mapping relationship between a pixel coordinate range and the identification information of the directional channel;

in step S132, a directional channel corresponding to the identification information of the directional channel corresponding to the feature pixel is used as the target directional channel.

In the above embodiment, each of the directional channels corresponds to a pixel coordinate range. If a feature pixel of a person's image falls within the pixel coordinate orientation of a certain directional channel, then that directional channel is the one that matches the target person.

As an alternative embodiment, the "identification information" may be a "number".

Shown in table 1 below is one embodiment of the mapping table:

TABLE 1

In the present disclosure, which pixel in the portrait the "feature pixel of the portrait" is not particularly limited as long as it can reflect the position of the corresponding person. As an alternative embodiment, coordinates at the center position of the portrait may be regarded as "feature pixels of the portrait".

As shown in fig. 5, each of the directional channels corresponds to a rectangular area on the image to be recognized, and accordingly, in table 1:

(x 1, y 1) is the coordinates of the pixel point at the upper left corner of the region corresponding to the directional channel numbered 1, and (x 1', y 1') is the coordinates of the pixel point at the lower right corner of the region corresponding to the directional channel numbered 1;

(x 2, y 2) is the coordinates of the pixel point at the upper left corner of the region corresponding to the directional channel numbered 2, and (x 2', y 2') is the coordinates of the pixel point at the lower right corner of the region corresponding to the directional channel numbered 2;

(x 3, y 3) is the coordinates of the pixel point at the upper left corner of the region corresponding to the directional channel numbered 3, and (x 3', y 3') is the coordinates of the pixel point at the lower right corner of the region corresponding to the directional channel numbered 3;

(x 4, y 4) is the coordinates of the pixel point at the upper left corner of the region corresponding to the directional channel numbered 4, and (x 4', y 4') is the coordinates of the pixel point at the lower right corner of the region corresponding to the directional channel numbered 4.

And so on.

Because the sound needs to be played to the target person, the target directional sound channel is determined according to the position of the face, so that the sound can be better heard by the person outside the vehicle, and the sound of the person outside the vehicle can be better transmitted into the vehicle. Thus, the feature pixel of the person image of the target person may be a pixel of the center point of the face in the person image of the target person.

As described above, the image to be identified may include partial images captured by a plurality of cameras. Correspondingly, the image to be identified comprises partial images shot by a plurality of cameras, and the mapping relation in the mapping table also comprises the mapping relation among the identification information of the cameras, the pixel coordinate range and the identification information of the directional channel.

The identification information of the camera may be the number of the camera.

Shown in table 2 below is one embodiment of the mapping table described above.

TABLE 2

As shown in fig. 5, each of the directional channels corresponds to a rectangular area on the image to be recognized, and accordingly, in table 2:

And so on.

In the present disclosure, as shown in fig. 3, the determining the location information of the target person in the image to be identified includes:

in step S121, a portrait in the image to be identified is identified;

in step S122, determining a distance between the person corresponding to the portrait and the target object;

in step S123, a person whose distance from the target object does not exceed a predetermined distance threshold is taken as the target person;

in step S124, the position information of the target person is determined.

The number of people around the target object may be very large, and people that are close enough to the target object may be target people that are talking to people inside the target object. Accordingly, in the presently disclosed embodiments, a person whose distance from the target object does not exceed a predetermined distance threshold is taken as the target person.

As described above, step S121 may be performed by means of AI recognition.

In the present disclosure, in the step of determining the distance between the person corresponding to the portrait and the target object, the distance between the person corresponding to the portrait and the target object is calculated using the following formula:

D＝(Wf*F)/Wp；

wf is the face width;

f is the focal length of a camera shooting the portrait;

wp is the face pixel width.

It should be noted that the "face width" may be an empirical value. For example, the face width is typically between 12cm and 14 cm.

In the present disclosure, an image to be recognized is photographed using an image pickup device (e.g., a camera). In order to accurately determine the target person, as shown in fig. 4, before the capturing of the image to be recognized around the target object, the selection method further includes:

in step S102, a video stream is received;

in step S104, each video frame in the video stream is taken as the image to be identified.

That is, in the present disclosure, each frame of image in the video stream may be used as an image to be identified, and identifying the target person in any frame of video requires determining a corresponding directional channel.

a memory having an executable program stored thereon;

the one or more processors, when invoking the executable program, are capable of implementing the above-mentioned directional channel selection method provided in the first aspect of the present disclosure.

In the present disclosure, when the target object is a vehicle, the electronic apparatus may be an in-vehicle apparatus provided inside the vehicle.

Optionally, the electronic device may further include one or more I/O interfaces coupled between the processor and the memory and configured to enable information interaction of the processor with the memory.

Wherein the processor is a device having data processing capabilities including, but not limited to, a Central Processing Unit (CPU) or the like; memory is a device with data storage capability including, but not limited to, random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically charged erasable programmable read-only memory (EEPROM), FLASH memory (FLASH); an I/O interface (read/write interface) is connected between the processor and the memory, and can implement information interaction between the processor and the memory, which includes, but is not limited to, a data Bus (Bus), and the like.

In some embodiments, the processor, memory, and I/O interfaces are interconnected by a bus, which in turn is connected to other components of the computing device.

As a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon an executable program, which when called, is capable of implementing the above-described directional channel selection method provided in the first aspect of the present disclosure.

As a fourth aspect of the present disclosure, there is provided a vehicle including a vehicle body, an electronic device, an image pickup apparatus, and a plurality of intercom devices, wherein the electronic device is the above-described electronic device provided by the present disclosure, the vehicle body is the target object, the image pickup apparatus is configured to acquire the image to be recognized, and the plurality of intercom devices respectively correspond to a plurality of directional channels having different sound propagation directions.

As described above, after the image to be recognized in the predetermined angle range around the vehicle body is acquired, the image to be recognized is recognized, and after the target person is recognized, the directional channel in the corresponding direction is used as the target directional channel. Then, the intercom equipment corresponding to the target directional channel is controlled to be opened, so that the person in the target object can be communicated with the person outside the target object under the condition that the person in the target object is in a closed state (for example, a window is not opened).

In the present disclosure, the specific type and specific structure of the intercom device are not particularly limited. As an optional implementation manner, the intercom main unit and the intercom auxiliary unit matched with the intercom main unit, the intercom main unit is arranged inside the vehicle body, and the intercom auxiliary unit is arranged on the outer surface of the vehicle body. A plurality of intercom devices are disposed about the vehicle body.

In the present disclosure, the primary interphone may include a microphone and a speaker, and the secondary interphone may include a sound directional microphone and a sound directional loud speaker. The plurality of sound directing microphones and the plurality of sound directing external square horns may cover all areas of the vehicle body perimeter.

Optionally, the main interphone and the auxiliary interphone are in a default closed state.

As an alternative implementation manner, the camera device and the electronic device are in communication connection in a wireless (e.g., WIFI, bluetooth) or wired manner, and the intercom device is in communication connection with the electronic device in a wireless (e.g., WIFI, bluetooth) or wired manner.

As an alternative embodiment, the image capturing apparatus may include a plurality of cameras oriented differently; as another alternative, the camera device may be a panoramic camera.

As an alternative embodiment, the camera device uses the real time streaming protocol (RTSP, real Time Streaming Protocol) to push the video stream to the electronic device.

The following describes a specific flow for realizing the conversation between the inside and outside of the vehicle:

adopting an RTSP push video streaming mode, and sending the acquired images to electronic equipment in real time by each camera, wherein the focal length of each camera is F;

the electronic equipment trains an image object classification model based on an image deep learning technology to support face recognition, the electronic equipment receives a video stream, takes a frame image in the video stream as an image to be recognized to carry out face recognition, and can obtain the pixel region coordinates of a central pixel of the face in the image; the electronic equipment determines that the image shot by the N-number camera has a human body, and calculates the distance between the human body and the camera, wherein the general calculation method comprises the following steps: (face width F)/face pixel width;

if the distance value between the person and the vehicle is smaller than a certain preset value, the electronic equipment determines that the person and the vehicle are matched with a mapping table of a camera number, an image pixel block (X, Y, X ', Y') and a directional sound channel, firstly, the camera is matched, an N-numbered camera is found, secondly, the pixel block is matched, the pixel coordinates (X-p, Y-p) of the center point of the face are located in an image pixel block area (X-p > =X and Y-p > =Y and X-p < =X and Y-p < =Y), and the corresponding directional sound channel number can be found if the matching is successful;

the electronic equipment orients the channel number, will turn on the intercom device of this orients the channel, close intercom device of other orients the channel at the same time.

As shown in fig. 6, the surrounding of the vehicle is divided into 8 areas (area 1, area 2, area 3, area 4, area 5, area 6, area 7, area 8) 360 ° around the center point of the vehicle, the 8 areas correspond to the 8 directional channels, and if the target person is detected in the area 1, the intercom devices in the areas 2, 3, 4, 5, 6, 7, 8 are turned off, and the intercom device in the area 1 is turned on. Thus, the in-vehicle personnel can talk with the outside personnel in zone 1 without opening the windows.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A method of selecting a directional channel, comprising:

2. The selection method according to claim 1, wherein the positional information of the target person includes coordinates of feature pixels constituting a portrait of the target person in the image to be recognized;

3. The selection method according to claim 2, wherein the feature pixel of the person image of the target person is a pixel of a center point of a face in the person image of the target person.

4. The selection method according to claim 2, wherein the image to be identified includes partial images photographed by a plurality of cameras, and the mapping relationship in the mapping table further includes mapping relationship between identification information of a camera, the pixel coordinate range, and identification information of the directional channel.

5. The selection method according to claim 1, wherein the determining of the position information of the target person in the image to be recognized includes:

identifying a portrait in the image to be identified;

and determining the position information of the target person.

6. The selection method according to claim 5, wherein in the step of determining the distance between the person corresponding to the figure and the target object, the distance between the person corresponding to the figure and the target object is calculated using the following formula:

D＝(Wf*F)/Wp；

wf is the face width;

f is the focal length of a camera shooting the portrait;

wp is the face pixel width.

7. The selection method according to any one of claims 1 to 6, wherein before the acquisition of the image to be recognized around the target object, the selection method further comprises:

receiving a video stream;

and taking each video frame in the video stream as the image to be identified.

8. The selection method according to any one of claims 1 to 6, wherein the predetermined angle is 360 °.

9. An electronic device, the electronic device comprising:

a memory having an executable program stored thereon;

one or more processors, which when invoking the executable program, are capable of implementing the selection method of any of claims 1 to 8.

10. A computer-readable storage medium having stored thereon an executable program, which when called is capable of implementing the selection method of any one of claims 1 to 8.

11. A vehicle, the vehicle including a vehicle body, an electronic device, an image capturing device, and a plurality of intercom devices, wherein the electronic device is the electronic device of claim 9, the vehicle body is the target object, the image capturing device is configured to acquire the image to be identified, and the intercom devices respectively correspond to a plurality of directional sound channels having different sound propagation directions.

12. The vehicle of claim 11, wherein the intercom device comprises an intercom main unit and an intercom sub-unit matched with the intercom main unit, the intercom main unit being disposed inside the vehicle body, the intercom sub-unit being disposed on an outer surface of the vehicle body.