CN113228620B - Image acquisition method and related equipment - Google Patents

Image acquisition method and related equipment Download PDF

Info

Publication number
CN113228620B
CN113228620B CN202180000814.3A CN202180000814A CN113228620B CN 113228620 B CN113228620 B CN 113228620B CN 202180000814 A CN202180000814 A CN 202180000814A CN 113228620 B CN113228620 B CN 113228620B
Authority
CN
China
Prior art keywords
vehicle
target
time
image
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202180000814.3A
Other languages
Chinese (zh)
Other versions
CN113228620A (en
Inventor
黄怡
汤秋缘
李皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN113228620A publication Critical patent/CN113228620A/en
Application granted granted Critical
Publication of CN113228620B publication Critical patent/CN113228620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C5/00Registering or indicating the working of vehicles
    • G07C5/08Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle or waiting time
    • G07C5/0841Registering performance data

Abstract

The embodiment of the application relates to the field of vehicles in the field of artificial intelligence, and discloses an image acquisition method and related equipment, wherein the method is applied to the vehicles, the vehicles are provided with camera devices, and the method comprises the following steps: controlling a camera device to shoot the environment around the vehicle to acquire first image data, wherein the first image data corresponds to the environment around the vehicle in a first time period; responding to the obtained photographing instruction, and obtaining a first moment corresponding to the photographing instruction; and acquiring a target image from the first image data according to the first moment, and outputting the target image. The user is not required to shoot images, so that the problem that the driver cannot shoot the images is solved, and the problem of potential safety hazards in shooting of the user is also solved; because the first time is included in the first time period, and the interval duration between the shooting time of the target image and the first time is less than or equal to the target threshold, the missing of the scenery to be shot is avoided.

Description

Image acquisition method and related equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an image acquisition method and related device.
Background
During the running of the vehicle, sometimes the driver or the passenger may feel that the surrounding scenery is beautiful and wants to take a picture and record it. But there are often the following problems: the driver can not shoot because of needing to drive the vehicle; the passenger needs to take out the mobile phone for shooting, but the scene which the passenger wants to shoot may be missed; when the scenery outside the window is shot, the window possibly needs to be swung down, the handle extends out of the window, the operation is troublesome, and potential safety hazards exist.
The above reasons make it difficult for the driver or the passenger to photograph a desired scene during driving.
Disclosure of Invention
The embodiment of the application provides an image acquisition method and related equipment, and a user is not required to shoot an image, so that the problem that a driver cannot shoot the image is solved, and the problem that potential safety hazards exist in shooting of the user is also solved; because the first time is included in the first time period, and the interval duration between the shooting time of the target image and the first time is less than or equal to the target threshold, the missing of the scenery to be shot is avoided.
In order to solve the above technical problem, the embodiments of the present application provide the following technical solutions:
in a first aspect, an embodiment of the present application provides an image acquisition method, which may be used in the field of vehicles in the field of artificial intelligence, and the method is applied to a vehicle, where one or more first camera devices are configured outside the vehicle, and the method includes: the method comprises the steps that a vehicle shoots an environment around the vehicle through a first camera device to obtain first image data, wherein the first image data correspond to the environment around the vehicle in a first time period; the first image data may be represented as a video obtained by recording an environment around the vehicle in the first time period, or the first image data includes a plurality of first video frames (that is, images) obtained from the video, where the plurality of first video frames correspond to respective times in the first time period, or the first image data includes a plurality of first images obtained by photographing the environment around the vehicle in the first time period. The vehicle responds to the obtained photographing instruction, and a first moment corresponding to the photographing instruction is obtained; the vehicle acquires a target image from the first image data according to a first moment, and outputs the target image, wherein the first moment is included in a first time period, the interval duration between the shooting moment of the target image and the first moment is less than or equal to a target threshold, and the value of the target threshold can be 5 seconds, 8 seconds, 10 seconds, 15 seconds or other values.
In the implementation mode, the environment around the vehicle is shot through the first camera device configured for the vehicle to obtain the first image data, when a shooting instruction input by a user is received, the first moment corresponding to the shooting instruction can be obtained, and the target image with the shooting moment as the first moment is selected from the first image data, namely, the user is not required to shoot the image, so that the problem that the driver cannot shoot the image is solved, and the problem that the potential safety hazard exists in the shooting of the user is also solved; in addition, the first image data corresponds to the environment around the vehicle in the first time period, namely the first camera device configured for the vehicle is used for shooting the environment around the vehicle, and then the image collected at the first moment is selected, so that the scene to be shot is avoided missing.
In one possible implementation manner of the first aspect, the method further includes: the vehicle responds to the received target voice, generates a photographing instruction, and the intention corresponding to the target voice is photographing; specifically, a model for performing voice recognition may be preconfigured in the vehicle, after the vehicle acquires any voice information (for convenience of description, hereinafter, referred to as "first voice information") input by the user, the first voice information input by the user may be converted into text content through the model for performing voice recognition, and then whether the user has an intention to take a picture, that is, whether the first voice information is a target voice is determined according to the text content corresponding to the first voice information, and if the first voice information is the target voice, the vehicle determines to acquire a shooting instruction input by the user. Or the vehicle acquires gesture information input by a second user, and when the gesture information of the second user determines that the second user inputs a preset gesture, a photographing instruction is generated in response to the acquired preset gesture; the preset gesture may be a static gesture or a dynamic gesture. The second user may be any user in the vehicle or may be a passenger confined to a fixed location.
In the implementation mode, the user can trigger the vehicle to generate the photographing instruction in a voice or gesture input mode, and the method is simple to operate and easy to implement.
In one possible implementation manner of the first aspect, the acquiring process is performed for a first moment corresponding to the photographing instruction for the vehicle. In one implementation mode, the vehicle responds to the obtained photographing instruction, determines a third time corresponding to the photographing instruction, and obtains a first time corresponding to the photographing instruction according to the third time, wherein the third time is the generation time of the photographing instruction; in another implementation manner, if the vehicle generates the photographing instruction in response to the received target voice, the vehicle acquires a fourth time corresponding to the target voice in response to the acquired photographing instruction, and acquires the first time corresponding to the photographing instruction according to the fourth time, where the fourth time is the acquisition time of the target voice, and since the receiving time of the whole target voice is a time period, the acquisition time of the target voice may be any one of the following times: the starting acquisition time of the target voice, the ending acquisition time of the target voice or the middle time of acquiring the target voice. In another implementation manner, if the vehicle generates a photographing instruction in response to the acquired preset gesture, the vehicle acquires a second time corresponding to the gesture instruction in response to the acquired photographing instruction, the second time is a photographing time of a gesture image corresponding to the gesture instruction, and the preset gesture may be a dynamic gesture or a static gesture.
In one possible implementation manner of the first aspect, a model and a semantic library for executing a Natural Language Processing (NLP) task may be further configured in the vehicle, and the vehicle inputs the text content and the semantic library corresponding to the first voice information into the model for executing the NLP task to determine whether the intention corresponding to the first voice information is to take a picture through the model for executing the NLP task. Here, the intention (intent) refers to a purpose of the user to indicate a user's demand. The vehicle may recognize the user's intention from voice information input from the user device. For example, the model for executing the NLP task may specifically adopt a bidirectional attention neural network (BERT), a Recurrent Neural Network (RNN), a question-answer neural network (network qa), and other models for performing machine reading and understanding (MRC), or other models capable of realizing a semantic understanding function.
In one possible implementation manner of the first aspect, the method further includes: the method comprises the steps that a vehicle obtains at least one target keyword from target voice, wherein the target keyword is description information of a shooting object and/or a shooting direction; the keyword is specific information of the intended content and is also key information for triggering a specific service. The keyword is, for example, a keyword in the user input information. As an example, for example, "right" and "black car" in the user input information "take a picture of a black car on the right" are keywords of the input information. Specifically, the semantic library may further include slot information, where the slot information is description information of a keyword, and for any one of the voice information input by the user (for convenience of description, hereinafter, referred to as "first voice information"), the vehicle may input text content corresponding to the first voice information and the semantic library into a model for performing an NLP task, so as to determine an intention corresponding to the first voice information through the model for performing the NLP task, extract the keyword in the first voice information through the model for performing the NLP task, output the intention corresponding to the first voice information, and output the keyword extracted from the first voice information. The vehicle acquires a target image from the first image data, and comprises: and the vehicle acquires a target image from the first image data according to the target keyword. If at least one target keyword comprises a keyword for describing the shooting object, an object indicated by the target keyword exists in the target image, and the keyword for describing the shooting object can be used for describing the name, type, color, shape or other description information of the shooting object; and/or if a keyword for describing the shooting direction exists in at least one target keyword, the shooting direction of the target image is the direction pointed by the target keyword.
In the implementation mode, the photographing instruction is input in a voice form, and after the target voice for triggering photographing is obtained, the target keyword is obtained from the target voice, and the target keyword is used for pointing to an object which the user wants to photograph, or the target keyword is used for pointing to a direction which the user wants to photograph, that is, the vehicle can further know what image the user wants, so that the accuracy of the output target image is improved, that is, the image meeting the user's expectation is output, and the user viscosity of the scheme is further improved.
In a possible implementation manner of the first aspect, the preset gesture is a preset static gesture, and the vehicle responds to the obtained photographing instruction to obtain a first moment corresponding to the photographing instruction, including: and the vehicle responds to the acquired photographing instruction, acquires a second moment corresponding to the preset gesture, and determines the second moment as the first moment, wherein the second moment is the photographing moment of the preset static gesture. In the implementation mode, when the preset gesture is the static gesture instruction, the shooting time of the static gesture can be directly acquired, the preset static gesture is used as the first time, another acquisition mode of the first time is provided, and the implementation flexibility of the scheme is improved; in addition, since the user sees the object to be photographed, the time interval between when the user makes the preset static gesture is generally short, the photographing time of the preset static gesture is directly determined as the first time, and the first time is also relatively consistent with the photographing time actually desired by the user.
In one possible implementation manner of the first aspect, the acquiring, by the vehicle, a first time corresponding to the photographing instruction in response to the acquired photographing instruction includes: the vehicle responds to the obtained photographing instruction, and determines a third moment corresponding to the photographing instruction, wherein the third moment is the generation moment of the photographing instruction; the vehicle determines the first time according to the third time, the first time is located before the third time, the value of the first time length may be 0.5 second, 1 second, 2 seconds, 3 seconds, 5 seconds or other numerical values, and the like, and the first time length may be determined by specifically combining factors such as the speed of the vehicle for processing the voice information.
In the implementation manner, in the process that the vehicle receives the voice information input by the user and processes the semantic information input by the user to determine whether the voice information is the target voice, the vehicle moves forward all the time, when the vehicle generates the photographing instruction, the time is slightly later than the time that the user wants to photograph, the first time is determined before the third time, namely the first time may be closer to the time that the user wants to photograph, and the first time is used as a reference point for acquiring the image, so that the image which is more suitable for the actual user to acquire can be acquired.
In one possible implementation manner of the first aspect, the vehicle is configured with at least two first camera devices, and the shooting ranges of the different first camera devices do not overlap or partially overlap, and the method further includes: the vehicle acquires a first direction; the first direction is determined in accordance with any one or more of: the direction of the first user's line of sight, the first user's facial orientation, the first user's body orientation, or other directions, etc., without limitation. Further, the first user may be any one or combination of: the driver, the user who is located at a preset position in the vehicle, the user who sends a photographing instruction or other types of users and the like specifically select which type of user can be determined by combining actual conditions. The vehicle selects at least one (for convenience of description, hereinafter referred to as M) target first camera device whose shooting range covers the first direction from at least two (for convenience of description, hereinafter referred to as S) first camera devices. The vehicle acquires a target image from the first image data, and comprises: the vehicle selects second image data from the first image data, the second image data is a subset of the first image data, and the second image data is acquired by M target camera devices in the S first camera devices; the vehicle acquires the target image from the second image data.
In the implementation mode, after the first image data corresponding to the plurality of image pickup devices is acquired, because the shooting ranges of the plurality of image pickup devices are not overlapped or partially overlapped, the first direction can also be acquired, the target image pickup device with the shooting range covering the first direction is selected from the plurality of image pickup devices, the second image data shot by the target image pickup device is selected from the first image data, and then the target image is acquired from the second image data; further, the first direction is determined according to any one or more of: the user's sight direction, user's face orientation, user's body orientation and user's gesture direction, and general user can be towards, look at or point to the region of interest, utilizes the first direction to filter the image data who shoots, is favorable to screening the image that the user expects to improve the user's of this scheme viscosity.
In one possible implementation manner of the first aspect, the vehicle is configured with at least two image capturing devices, and the capturing ranges of the different image capturing devices do not overlap or partially overlap, and the method further includes: acquiring a target keyword from the target voice, wherein the target keyword is description information of a shooting object and/or a shooting direction; acquiring a first direction, wherein the first direction is determined according to any one or more of the following items: gaze direction, facial orientation, and body orientation; and selecting a target camera device with a shooting range covering the first direction from at least two camera devices. The vehicle acquires a target image from the first image data, and comprises: selecting second image data from the first image data, wherein the second image data is acquired by the target camera device; and acquiring a target image from the second image data according to the target keyword, wherein the target image has an object indicated by the target keyword, and/or the shooting direction of the target image is the direction indicated by the target keyword.
In a second aspect, an embodiment of the present application provides an image acquiring apparatus, which may be used in the field of vehicles in the field of artificial intelligence, where the image acquiring apparatus is applied to a vehicle, and the vehicle is equipped with an image capturing device, and the image acquiring apparatus includes: the shooting module is used for controlling the camera device to shoot the environment around the vehicle so as to obtain first image data, and the first image data corresponds to the environment around the vehicle in a first time period; the acquisition module is used for responding to the acquired photographing instruction and acquiring a first moment corresponding to the photographing instruction; and the acquisition module is used for acquiring a target image from the first image data according to a first moment and outputting the target image, wherein the first moment is included in a first time period, and the interval duration between the shooting moment of the target image and the first moment is less than or equal to a target threshold.
The image processing apparatus provided in the second aspect of the embodiment of the present application may further perform steps performed by a vehicle in each possible implementation manner of the first aspect, and for specific implementation steps of the second aspect and each possible implementation manner of the second aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the first aspect, and details are not repeated here.
In a third aspect, the present application provides a vehicle, which may include a processor, a memory coupled to the processor, and a program stored in the memory, where the program stored in the memory is executed by the processor to implement the steps performed by the vehicle in the image capturing method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the program runs on a computer, the computer executes the steps executed by the vehicle in the image acquisition method according to the first aspect.
In a fifth aspect, the present application provides a circuit system, which includes a processing circuit configured to execute the steps performed by the vehicle in the image acquisition method.
In a sixth aspect, the present application provides a computer program, which when run on a computer, causes the computer to execute the steps performed by the vehicle in the image acquisition method according to the first aspect.
In a seventh aspect, an embodiment of the present application provides a chip system, which includes a processor, and is configured to implement the functions recited in the foregoing aspects, for example, sending or processing data and/or information recited in the foregoing methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the server or the communication device. The chip system may be formed by a chip, or may include a chip and other discrete devices.
Drawings
FIG. 1a is a schematic structural diagram of a vehicle in an image acquisition method according to an embodiment of the present application;
fig. 1b is a schematic flowchart of an image obtaining method according to an embodiment of the present disclosure;
fig. 2 is another schematic flowchart of an image acquiring method according to an embodiment of the present disclosure;
fig. 3 is a schematic interface diagram of a function of triggering a shooting of a surrounding environment in an image acquisition method according to an embodiment of the present application;
fig. 4 is a schematic interface diagram illustrating keyword acquisition in the image acquisition method according to the embodiment of the present application;
fig. 5 is another schematic flowchart of an image acquiring method according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a target image and a first time in an image obtaining method according to an embodiment of the present application;
fig. 7 is a schematic interface diagram of an output target image in the image acquisition method according to the embodiment of the present application;
fig. 8 is another schematic flowchart of an image acquisition method according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an image acquiring apparatus according to an embodiment of the present application;
FIG. 10 is a schematic structural diagram of an apparatus for acquiring an image according to an embodiment of the present disclosure;
FIG. 11 is a schematic structural diagram of a vehicle according to an embodiment of the present application;
fig. 12 is another schematic structural diagram of a vehicle according to an embodiment of the present application.
Detailed Description
The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the manner in which objects of the same nature are distinguished in the embodiments of the application. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The embodiment of the application can be applied to scenes needing photographing in the running process of various vehicles, including but not limited to cars, trucks, motorcycles, buses, ships, airplanes, helicopters, recreational vehicles, playground vehicles, construction equipment, trams, golf carts, trains and the like. Specifically, when a user (including a driver or a passenger located in the vehicle) wants to photograph the environment around the vehicle while the vehicle is running, the passenger does not have time to photograph the environment around the vehicle since the driver does not have time to photograph the environment around the vehicle and the vehicle is running at a high speed, and the safety level of photographing the environment around the vehicle by the user is low while the vehicle is running.
In order to solve the above problems, an embodiment of the present application provides an image acquiring method, which is applied to a vehicle, and one or more image capturing devices are configured outside the vehicle, where the image capturing device may be embodied as a video camera, a camera, or other types of image capturing devices, and in order to provide a more intuitive understanding for the vehicle provided by the embodiment of the present application, the vehicle employed in the embodiment of the present application is described with reference to fig. 1a, fig. 1a is a schematic structural diagram of the vehicle in the image acquiring method provided by the embodiment of the present application, fig. 1a takes a vehicle form as an example, and as shown in fig. 1a, a black origin in fig. 1a represents a position of the image capturing device outside the vehicle, a plurality of image capturing devices are configured outside the vehicle (in fig. 1a, for example, 6 image capturing devices are configured), and different image capturing devices are configured at different positions of the vehicle, so that the shooting ranges of the different cameras do not overlap or partially overlap, it should be understood that the example in fig. 1a is only for convenience of understanding the present solution and is not intended to limit the present solution.
Specifically, as shown in fig. 1b, fig. 1b is a schematic flow chart of the image acquiring method according to the embodiment of the present disclosure. S1, continuously shooting the environment around the vehicle by the vehicle through an external camera device, or shooting the environment around the vehicle by the vehicle through the external camera device in a non-continuous mode to acquire first image data; the first image data correspond to the environment around the vehicle in a first time period, wherein the first time period refers to the time when the vehicle is shot by the camera device; the first time period comprises a plurality of instants of time, i.e. the first image data comprises images of the environment surrounding the vehicle at a plurality of instants of time. S2, the vehicle can detect a photographing instruction input by a user in real time, and respond to the acquired photographing instruction to acquire a first moment corresponding to the photographing instruction; the photographing instruction can be specifically triggered based on voice or gestures input by a user; or the vehicle can also acquire the blink frequency of the user, and when the blink frequency of the user is greater than or equal to a preset threshold value, the photographing instruction input by the user is determined to be acquired; or, the vehicle may be further configured with a sensor on the steering wheel to acquire the heart rate of the user, and when the heart rate of the user is greater than or equal to a preset threshold value, it is determined that the photographing instruction input by the user is acquired; the vehicle may also acquire other types of human body information of the user to acquire a photographing instruction input by the user, and the like, and the types of the photographing instruction are not exhaustive here. And S3, after the vehicle determines the first moment, acquiring a target image from the first image data according to the first moment, and outputting the target image, wherein the first moment is included in the first time period, and the interval duration between the shooting moment of the target image and the first moment is less than the target threshold. The target image is obtained from the first image data, and the first image data is obtained by shooting through the camera device configured for the vehicle, namely, the user is not required to shoot the image, so that the problem that the driver or the passenger cannot shoot the image is solved, and the problem that potential safety hazards exist in the shooting of the user is also solved; in addition, the first image data corresponds to the environment around the vehicle in the first time period, namely, the image pickup device configured for the vehicle is used for continuously shooting the environment around the vehicle, and then the image collected at the first time is selected, so that the scene to be shot is avoided missing. It should be noted that, since personal information of the user during driving, such as voice, gesture, blink frequency, heart rate, and the like, may relate to the personal privacy of the user, in one implementation, the user may input a first operation to the vehicle (a specific implementation of the first operation will be described in detail in subsequent steps), and the vehicle triggers to start acquiring the aforementioned information in response to the first operation input by the user to trigger generation of a photographing instruction. In another implementation, the vehicle may output query information to the user to determine whether one or more pieces of information of the user can be collected; specifically, the vehicle may output the query information in a voice, text, or other manner, for example, the vehicle outputs "asking whether the user can collect the voice information sent by the user" in a voice manner, and if the user replies "ok", it is determined that the vehicle can collect the voice sent by the user.
As can be seen from the above description, the photographing instruction may be embodied in various forms, and in the following embodiments, the image obtaining method provided in the embodiments of the present application is described in detail only by taking a voice instruction and a gesture instruction as examples of the photographing instruction.
Firstly, the shooting instruction is a voice instruction
Specifically, referring to fig. 2, fig. 2 is a schematic flowchart of a method for acquiring an image according to an embodiment of the present application, where the method for acquiring an image according to the embodiment of the present application may include:
201. the vehicle controls a first camera device to shoot the environment around the vehicle to acquire first image data.
In the embodiment of the application, S first camera devices may be configured outside the vehicle, and the environment around the vehicle is photographed by the S first camera devices to obtain first image data, where S is an integer greater than or equal to 1, and when a value of S is greater than 1, photographing directions of different first camera devices do not overlap or partially overlap; the first image data corresponds to an environment around the vehicle during the first time period, and the concept of the first time period and the first image data may be understood in conjunction with the above description.
Specifically, in one case, the vehicle may record the environment around the vehicle through the S first image capturing devices to obtain video data corresponding to the environment around the vehicle in the first period of time. Further, in one implementation, the vehicle may directly determine the aforementioned video data as the first image data, that is, the first image data may be specifically represented as a video obtained by shooting the environment around the vehicle in the first time period.
In another implementation manner, the vehicle may perform a video frame extraction operation according to video data corresponding to an environment around the vehicle in a first time period to acquire first image data, where the first image data includes a plurality of first video frames (i.e., images), and the plurality of first video frames correspond to respective moments in the first time period.
In another case, in the first time period, the vehicle may photograph the environment around the vehicle through the first camera device according to the target frequency to obtain the first image data, that is, the first image data includes a plurality of first images, and the plurality of first images are used for showing the environment around the vehicle at each time in the first time period.
More specifically, in an implementation manner, after the vehicle is started, the plurality of first image capturing devices outside the vehicle may be triggered to start to automatically and continuously capture the environment around the vehicle, and it should be noted that the purpose of the vehicle capturing the environment around the vehicle by the first image capturing devices outside the vehicle may be not only to conveniently capture the environment around the vehicle by a user, but also to assist the vehicle in route planning, and the like, which is not limited herein.
In another implementation manner, after the vehicle detects the first operation input by the user, the vehicle starts a photographing function of the vehicle in response to the detected first operation, and triggers the first camera device outside the vehicle to continuously photograph the environment around the vehicle. In another implementation, since a plurality of external first image capturing devices are pre-configured in the vehicle, after the vehicle is started, the vehicle continuously captures the environment around the vehicle through a part of the external first image capturing devices, and after the vehicle detects a first operation input by a user, the continuous capturing of the environment around the vehicle through all the first image capturing devices outside the vehicle is triggered in response to the detected first operation.
Further, in one case, the first operation may be a voice instruction input by the user, for example, when the user utters a voice of "turn on the vehicle auxiliary photographing function", it is considered that the vehicle detects the first operation input by the user. In another case, a button for turning on the "vehicle-assisted photographing function" may be provided in the vehicle in advance, and when the user presses the aforementioned button, it is regarded that the vehicle detects the first operation input by the user. In another case, one or more touch screens may be configured in advance in the vehicle, a first icon for receiving a first operation is displayed on the touch screen, and a user may perform a touch operation on the first icon to input the first operation, where the touch operation may be a single click, a double click, a long press, or the like. It should be understood that the examples are only for convenience of understanding the manner in which the user inputs the first operation, and are not intended to limit the present solution.
To understand the present solution more intuitively, please refer to fig. 3, and fig. 3 is a schematic interface diagram of a function for triggering shooting of a surrounding environment in an image acquisition method according to an embodiment of the present application. Fig. 3 includes two sub-schematic diagrams (a) and (b). Taking the example of the first icon (i.e., a1 in the sub-diagram (a) of fig. 3) arranged in the center control screen of the vehicle in the sub-diagram (a) of fig. 3, the user may click on a1 in the sub-diagram (a) of fig. 3 to input a first operation, so as to trigger the vehicle to start shooting the environment around the vehicle through the external first camera device. In the sub-diagram (b) of fig. 3, the vehicle is also provided with a first icon (i.e., a2 in the sub-diagram (a) of fig. 3) in the rear row, and the user can click the a2 in the sub-diagram (b) of fig. 3 to input a first operation, so as to trigger the vehicle to start shooting the environment around the vehicle through the external first camera device; it should be noted that fig. 3 is only an example for facilitating understanding of the present solution, and the vehicle may also set the first icon and the like in the center control screen and the rear touch screen at the same time, which is not limited herein.
202. The vehicle acquires the target voice and generates a photographing instruction in response to the received target voice.
In some embodiments of the application, a photographing instruction preset in a vehicle may be triggered by voice input by a user, a model for performing voice recognition may be preconfigured in the vehicle, after the vehicle acquires any voice information (for convenience of description, hereinafter, referred to as "first voice information") input by the user, the first voice information input by the user may be converted into text content by the model for performing voice recognition, and then whether the user has an intention to photograph is determined according to the text content corresponding to the first voice information, that is, whether the first voice information is a target voice, and if the first voice information is the target voice, the vehicle determines to acquire the photographing instruction input by the user.
Specifically, the vehicle may further include a model for executing a Natural Language Processing (NLP) task and a semantic library, the model for executing the NLP task may also be referred to as a model for executing a Natural Language Understanding (NLU) task, and the vehicle may input a text content and a semantic library corresponding to the first speech information into the model to determine whether the intention corresponding to the first speech information is a photograph or not by the model.
Wherein the intent (intent) refers to a user's purpose for indicating the user's needs. The vehicle may recognize the user's intention from voice information input from the user device. For example, the voice information input by the user is "that the surface of the black car is velvet and beautiful", and the vehicle can recognize the user's intention to "take a picture" from the input voice information. The recognition model of the intention can be obtained by training according to a large number of corpora, and the large number of corpora adopt different expression modes to express the corpora of the intention.
The model for performing the NLP task may be specifically implemented by a neural network, or may be implemented by a non-neural network model, for example, the model for performing the NLP task may be specifically a bidirectional attention neural network (BERT), a Recurrent Neural Network (RNN), a question-answer neural network (QANet), or the like, which is used for performing machine reading and understanding (MRC), or other types of models may also be used.
In one case, the semantic library may include description information of the shooting intention, and the description information of the intention may support a flexible description manner, and correspondingly, in the embodiment of the present application, any natural language expression that supports the habit of the user may be supported, that is, the target speech input by the user may be relatively straight-white. By way of example, the user may employ a more standardized, formatted expression such as "shoot sunset," "shoot the left building," "shoot the right black car," or the like; the target voice input by the user may also be implicit, for example, the target voice input by the user is "nice beauty in sunset today", "beautiful flower at roadside", "which style the car in front of", and the like, which is not limited herein.
In another case, a plurality of words may be included in the semantic library, and when the vehicle detects that a word in the semantic library appears in the voice information input by the user (which may include a driver and a passenger in the vehicle), it is determined that the user has a photographing intention, so that the voice information output by the user is determined as a target voice, and the target voice is determined as a photographing instruction. As examples, for example, the words included in the semantic library are: shooting, taking a photograph, being beautiful, nice to the eye, or other words, and the like, are not exhaustive here. It should be understood that the terms herein are merely to facilitate an understanding of the present disclosure and are not to be construed as limiting the present disclosure.
More specifically, in an implementation manner, if the vehicle starts the "vehicle auxiliary photographing" function based on the first operation input by the user, the vehicle may detect the voice information input by the user in real time after the user actively starts the "vehicle auxiliary photographing" function, so as to obtain the target voice. In another implementation manner, if the vehicle is started, the "vehicle auxiliary photographing" function is automatically started, and after the vehicle is started, the voice information input by the user is detected in real time to acquire the target voice.
Optionally, the vehicle may also obtain the target keyword from the target voice. The keyword is specific information of the intended content, and is also key information for triggering a specific service. The keyword is, for example, a keyword in the user input information. As an example, for example, "right" and "black car" in the user input information "take a picture of a black car on the right" are keywords of the input information; as another example, the "sunset" in the "sunset beauty of today" input information by the user is the keyword of the input information.
Specifically, the semantic library may further include slot information, the slot information is description information of a keyword, the slot description information also supports a flexible description mode, in one implementation, a description mode with similar attributes may be used, for example, the description information of the slot of the "type of the photographic subject" may be a description mode of a "noun" or the like, in another implementation, a description mode of a keyword type may also be used, for example, the slot information may be a description mode of a "photographic direction", "type of the photographic subject", "shape of the photographic subject" or the like. It should be understood that the examples are only for convenience of understanding the concept of slot position information and are not intended to limit the present solution.
For any one of the voice information input by the user (hereinafter, referred to as "first voice information" for convenience of description), the vehicle may input text content and a semantic library corresponding to the first voice information into a model for performing the NLP task, to determine an intention corresponding to the first voice information through the model for performing the NLP task, and extract a keyword in the first voice information through the model for performing the NLP task, and output the intention corresponding to the first voice information and the keyword extracted from the first voice information.
Further, the slot position information in the semantic library may be all selectable keywords, or the slot position information in the semantic library may also include one or more pieces of necessary slot position information, and if the vehicle does not acquire a keyword corresponding to the necessary slot position information from the target voice, query information may be output to the user, where the query information is used to instruct the user to input the keyword corresponding to the necessary slot position information. As an example, the mandatory slot information may include "type of photographic subject"; as another example, the slot-to-be-selected information may include "type of a subject" and "shooting direction", and what type of slot-to-be-selected information is specifically set, and may be determined in combination with actual situations, which is not limited herein.
The vehicle may output the query information in a form of voice, may output the query information in a form of text, may output the query information in a form of voice and text, or may output the query information in another form, which is not limited herein.
For a more intuitive understanding of the present disclosure, please refer to fig. 4, where fig. 4 is a schematic interface diagram of acquiring a keyword in an image acquiring method according to an embodiment of the present disclosure. As shown in fig. 4, the query information in text form is output on the display screen at the rear row of the vehicle, and in fig. 4, the vehicle outputs the query information in text and voice at the same time as an example, and B1 represents that the vehicle plays the query information in voice.
203. And the vehicle responds to the acquired photographing instruction and acquires a first moment corresponding to the photographing instruction.
In the embodiment of the application, after the vehicle detects the photographing instruction, the vehicle can respond to the acquired photographing instruction to acquire the first moment corresponding to the photographing instruction. Specifically, in one implementation, the vehicle determines a third time corresponding to the photographing instruction in response to the acquired photographing instruction. The third moment is the generation moment of the photographing instruction, that is, the third moment is the moment when the vehicle receives the target voice input by the user, and the photographing instruction is generated in response to the received target voice. And the vehicle acquires the first moment corresponding to the photographing instruction according to the third moment. Specifically, in one case, the vehicle may directly determine the third time as the first time, that is, the first time is the generation time of the photographing instruction. In another case, the vehicle takes a time point which is before the third time and has a time interval with the third time as the first time, and the first time interval may take a value of 0.5 second, 1 second, 2 seconds, 3 seconds, 5 seconds, or other values, and may be determined by specifically combining factors such as a speed at which the vehicle processes the voice information.
In the embodiment of the application, because the vehicle is moving forward all the time in the process that the vehicle receives the voice information input by the user and the vehicle processes the semantic information input by the user to determine whether the voice is the target voice, when the vehicle generates the photographing instruction, the time that the user wants to photograph is slightly later, the first time is determined before the third time, that is, the first time may be closer to the time that the user wants to photograph, and the first time is taken as the reference point for acquiring the image, so that the image which is more suitable for the actual user to acquire can be acquired.
In another implementation manner, in response to the obtained photographing instruction, the vehicle obtains a fourth time corresponding to the target voice, where the fourth time is the obtaining time of the target voice, and since the receiving time of the whole target voice is a time period, the obtaining time of the target voice may be any one of the following times: the starting time of acquiring the target voice, the ending time of acquiring the target voice (which may also be referred to as the time when the target voice is successfully received), the middle time of acquiring the target voice (which may also be referred to as the time corresponding to the middle point of the receiving duration of the whole target voice), or other time points in the acquiring process of the target voice, etc., which are not limited herein.
Further, the vehicle may directly determine the fourth time as the first time, or may use a time point before the fourth time and at which an interval duration between the fourth time and the fourth time is the second time as the first time, where a value of the second time may be 0.5 second, 1 second, 2 seconds, 3 seconds, or another value, and may specifically be determined by combining factors such as a type of the fourth time.
204. The vehicle acquires a first direction, and selects a target camera device with a shooting range covering the first direction from at least two first camera devices.
In some embodiments of the present application, at least two first camera devices may be configured in the vehicle, and the shooting ranges of different first camera devices do not overlap or partially overlap, so that the vehicle may further acquire the first direction, and select at least one target camera device whose shooting range covers the first direction from the at least two first camera devices. Wherein the first direction is determined in accordance with any one or more of: the direction of the first user's line of sight, the first user's facial orientation, the first user's body orientation, or other directions, etc., and is not limited herein. Further, the first user may be any one or combination of: the driver, the user who is located at a preset position in the vehicle, the user who sends a photographing instruction or other types of users and the like specifically select which type of user can be determined by combining actual conditions.
Specifically, if the first direction is the sight line direction of the user, in an implementation manner, at least one second camera device may also be configured inside the vehicle, the vehicle responds to the obtained photographing instruction, acquires an image of the first user through the second camera device, performs face detection on the image of the first user to determine a face area in the image of the first user, and performs key point positioning on the face area to determine an eye area in the face area. The aforementioned key point positioning operation may be completed by a preset algorithm, where the preset algorithm includes, but is not limited to, an edge detection (robert) algorithm, a Sobel (Sobel) algorithm, and the like, or the aforementioned key point positioning operation may be completed by a preset model, where the preset model may be an active contour line (snake) model, or may be completed by a neural network for performing face key point detection, and the like, and the method for performing face key point detection is not exhaustive here. The vehicle intercepts an eye region image from the image of the first user, and generates a sight line direction corresponding to the eye region image through a neural network, namely, the sight line direction of the first user is obtained.
In another implementation manner, an eye tracker may be configured inside the vehicle, the vehicle acquires the gaze direction of the first user through the eye tracker, and the technology used by the eye tracker may be the pupil corneal reflection technology (PCCR), the three-dimensional (3-dimensional, 3D) eyeball model-based visual tracking or other technologies, which is not exhaustive, and it is necessary to say that the vehicle may also adopt other means to acquire the gaze direction of the first user, which is not exhaustive here.
If the first direction is the face direction of the user, the vehicle can control the second camera device to collect the image of the first user, and the face direction of the first user is generated through a neural network for face direction recognition according to the image of the first user. As an example, for example, the neural network for face orientation recognition may adopt a Learning Vectorization (LVQ) neural network, a BP neural network or other types of neural networks, and the like, which are not exhaustive here.
If the first direction is the body orientation of the user, in one implementation, a sensor for acquiring point cloud data of the user may be configured inside the vehicle, and the vehicle may generate the body orientation of the first user through the point cloud data corresponding to the current posture of the first user by the sensor. In another implementation manner, the vehicle may acquire an image of the first user through the second camera device, and generate the body orientation of the first user through the neural network, and the like, where the manner of generating the body orientation of the first user is not exhaustive. It should be noted that the first direction may also adopt other types of directions, for example, the first direction may adopt a gesture direction of the first user, and the like, which is not exhaustive here.
It should be noted that, since the obtaining process of the first direction may involve the personal privacy of the user, in one implementation, the vehicle may output inquiry information to the user to determine whether the first direction can be collected; specifically, the vehicle may output the query information by voice, text, or other means, for example, the vehicle may output "asking whether to acquire your sight line direction" by voice, and if the user replies "ok", it is determined that the vehicle can acquire the sight line direction of the user. In another implementation, the user may input a second operation to the vehicle, and the vehicle triggers the start of the acquisition operation in the first direction in response to the second operation input by the user. Specifically, in one case, the second operation may be that the user turns on a device for acquiring the first direction in the vehicle, for example, the eye tracker is in a closed state in a default state, and when the user actively turns on the eye tracker, the second operation is considered to be input. In another case, the user may input the second operation and the like through a center control screen provided in the vehicle, and it should be understood that the examples are only for convenience of understanding of the present solution and are not intended to limit the present solution.
205. And the vehicle acquires a target image from the first image data according to the first moment and outputs the target image.
In this embodiment, after the vehicle determines the first time point through step 203, one or more target images may be obtained from the first image data according to the first time point, and the one or more target images may be output. The first time is included in a first time period, and the interval duration between the shooting time of the target image and the first time is less than or equal to a target threshold; the value of the target threshold may be 5 seconds, 8 seconds, 10 seconds, 15 seconds, or other values, and the like, which is not limited herein.
Specifically, the present invention is directed to a process for acquiring a target image from first image data. Step 204 is an optional step, and if step 204 is not executed and the target keyword is not obtained from the target speech in step 202, step 205 may include: if the first image data is embodied as video data, the vehicle acquires one or more target video frames with the shooting time being the first time from the first image data, and determines each acquired target video frame as a target image, namely the shooting time of the target image is the first time.
If the first image data specifically includes a plurality of first images, in an implementation manner, a number threshold of the target images may be configured in advance in the vehicle (for convenience of description, a value of the number threshold of the target images is taken as N for example subsequently), the vehicle acquires, from the first image data, N images whose duration of an interval between the shooting time and the first time is the closest, and determines the N images as the N target images, where the value of N may be 3, 4, 5, 6, 8, 9, or other values, and a specific value of N may be set flexibly in combination with an actual situation.
In another implementation manner, a value of a target threshold may be preconfigured in the vehicle, the vehicle selects all images from the first image data, where an interval duration between the shooting time and the first time is less than or equal to the target threshold, and determines all the acquired images as target images.
If step 204 is not executed and at least one target keyword is obtained from the target speech in step 202, step 205 may include: and the vehicle acquires a target image from the first image data according to the first moment and at least one target keyword. If a keyword for describing the photographic object exists in the at least one target keyword, an object indicated by the target keyword exists in the target image, and the keyword for describing the photographic object can be used for describing the name, type, color, shape, description information of other photographic objects, and the like of the photographic object. Or, if a keyword for describing the shooting direction exists in the at least one target keyword, the shooting direction of the target image is the direction pointed by the target keyword.
Further, the vehicle is provided with S first imaging devices in total, and in one case, only a keyword for describing a photographic subject is present in at least one target keyword, and a keyword for describing a photographic direction is not present. As an example, for example, if the voice information input by the user is "sunset beauty of today", a target keyword "sunset" may be acquired, which is a keyword for describing the type of the photographic subject. If the first image data is video data, the first image data includes S first videos. The vehicle acquires S second videos from the first image data, wherein each second video in the S second videos is a video with a starting point at a fifth moment and an end point at a sixth moment, the interval duration between the fifth moment and the first moment is equal to a target threshold, and the interval duration between the sixth moment and the first moment is equal to the target threshold. The vehicle may acquire at least one target video frame from the S second videos, and determine each target video frame as a target image, where an object indicated by the target keyword exists in the target video frame, and an object indicated by the target keyword exists in the target image.
If the first image data includes a plurality of images, that is, the first image data includes S groups of first images corresponding to the S first image capturing devices. The vehicle acquires S sets of second images from the first image data, the image capturing time of the oldest captured image in the S sets of second images being the fifth time, and the image capturing time of the latest captured image in the S sets of second images being the sixth time. The vehicle may acquire at least one target image in the S sets of second images, the target image having an object indicated by the target keyword.
In another case, only at least one keyword for describing a photographing direction is present and no keyword for describing a photographing object is present among the at least one target keyword, the at least one keyword for describing the photographing direction being used to indicate one or more second directions. As an example, for example, if the voice information input by the user is "how, you look good at the front," a target keyword "front" may be acquired, which is a keyword for describing the shooting direction. If the first image data include S first videos, because the S first videos correspond to the S first cameras one to one, and the shooting ranges of different first cameras in the S first cameras do not overlap or partially overlap, the vehicle selects N first cameras corresponding to all the second directions from the S first cameras to acquire the N first videos from the S first videos, acquires, for each first video in the N first videos, a video frame whose shooting time is the first time from the first video, and determines the acquired video frame as a target image to acquire N target images from the N first videos, where the shooting time of each target image is the first time.
If the first image data comprises S groups of first images, the S groups of first images correspond to the S first camera devices one by one, and the shooting ranges of different first camera devices in the S first camera devices are not overlapped or partially overlapped, the vehicle selects N first camera devices corresponding to all second directions from the S first camera devices to acquire the N groups of first images from the S groups of first images, acquires one or more images with the shooting time as the first time from the N groups of first images, and determines the acquired images as target images, wherein the shooting time of each target image is the first time.
In another case, both a keyword for describing a photographing direction and a keyword for describing a photographing object exist in the at least one target keyword. If the first image data includes S first videos, the vehicle may acquire N first videos from the S first videos (the specific implementation manner participates in the above description), acquire N second videos from the N first videos (the specific implementation manner participates in the above description) according to the first time, and further acquire a target image from the N second videos (the specific implementation manner participates in the above description) according to a target keyword for describing a shooting object.
If the first image data includes S groups of first images, the vehicle may acquire N groups of first images from the S groups of first images (the specific implementation manner participates in the above description), acquire N groups of second images from the N groups of first images according to the first time (the specific implementation manner participates in the above description), and acquire a target image from the N groups of second images according to a target keyword for describing the photographic object (the specific implementation manner participates in the above description).
For a more intuitive understanding of the embodiments of the present application, please refer to fig. 5, and fig. 5 is a schematic flowchart of an image acquiring method according to the embodiments of the present application. And C1, after the vehicle is started, triggering four first camera devices outside the vehicle to shoot the environment around the vehicle so as to acquire first image data, wherein the first image data are videos corresponding to the environment around the vehicle in a first time period, and the four first camera devices are respectively positioned in the front, the left side, the right side and the rear of the vehicle. C2, the vehicle responds to the first operation input by the user, starts a "vehicle auxiliary photographing" function to start to acquire the first voice information input by the user in real time (i.e. any one voice information input by the user), and detects whether the first voice information input by the user is a target voice (i.e. determines whether the first voice information input by the user is a photographing instruction), and if the vehicle detects the target voice input by the user, acquires a target keyword from the target voice, for example, the input target voice is "photograph a black car in the front right", and the vehicle acquires two target keywords, namely "front right" and "black car". And C3, the vehicle responds to the obtained photographing instruction, and obtains the first time corresponding to the photographing instruction. C4, the vehicle obtaining the target image from the first image data according to the first time and the target keyword, it should be understood that the example in fig. 5 is only for facilitating understanding of the scheme and is not used to limit the scheme.
For a more intuitive understanding of the present disclosure, please refer to fig. 6, and fig. 6 is a schematic diagram of a target image and a first time in an image obtaining method according to an embodiment of the present disclosure. In fig. 6, taking the example that the first image data includes a plurality of images, 20 images exist in S groups of first images, the S groups of first images are arranged in the order of shooting time from morning to evening, one image is represented by one rectangle in fig. 6, D1 refers to the image shot at the first time, D2 represents the image shot at the fifth time, D3 represents the image shot at the sixth time, D4, D5 and D6 represent the objects indicated by the target keywords, D4, D5 and D6 represent the target images, and the time intervals between the three target images and the first time are all smaller than the target threshold, it should be understood that the example in fig. 6 is only for convenience of understanding the relationship between the shooting time of the target images and the first time, and is not used for limiting the present solution.
In the embodiment of the application, the photographing instruction is input in a voice form, and after the target voice for triggering photographing is obtained, the target keyword is obtained from the target voice, and the target keyword is used for pointing to an object which the user wants to photograph, or the target keyword is used for pointing to a direction which the user wants to photograph, that is, the vehicle can further know what image the user wants, so that the accuracy of the output target image is improved, that is, the image meeting the user's expectation is output, and the user viscosity of the scheme is further improved.
If step 204 is executed and the target keyword is not obtained from the target speech in step 202, step 205 may include: after the vehicle selects M target image capturing devices corresponding to the first direction from the S first image capturing devices in step 204, selecting second image data from the first image data; the second image data is a subset of the first image data, is acquired by M target camera devices in the S first camera devices, and acquires a target image from the second image data. It should be noted that, in the specific implementation manner of the vehicle acquiring the target image from the second image data according to the first time, the vehicle acquires the target image from the first image data according to the third time in the case of "if the step 204 is not executed and the target keyword is not acquired from the target voice in the step 202". The difference is that the first image data in the case of "if step 204 is not executed and the target keyword is not obtained from the target speech in step 202" is replaced by the second image data, which can be understood by referring to the above description and is not described herein again.
In the embodiment of the application, after the first image data corresponding to the plurality of image capturing devices is acquired, because the shooting ranges of the plurality of image capturing devices are not overlapped or partially overlapped, the first direction is also acquired, the target image capturing device with the shooting range covering the first direction is selected from the plurality of image capturing devices, the second image data shot by the target image capturing device is selected from the first image data, and then the target image is acquired from the second image data; further, the first direction is determined according to any one or more of: the user's sight direction, user's face orientation, user's body orientation and user's gesture direction, and general user can be towards, look at or point to the region of interest, utilizes the first direction to filter the image data who shoots, is favorable to screening the image that the user expects to improve the user's of this scheme viscosity.
If step 204 is executed and at least one target keyword is obtained from the target voice in step 202, in one case, only a keyword for indicating a shooting object exists in the at least one target keyword, and no keyword for indicating a shooting direction exists, the vehicle may select second image data from the first image data according to the first direction, where the second image data is obtained by acquiring through M target cameras. The vehicle acquires the target image from the second image data based on the target keyword for indicating the photographic object and the first time. In a specific implementation manner of the foregoing steps, reference may be made to the above-mentioned "if step 204 is not executed, and in a case that at least one target keyword is obtained from the target voice in step 202, only a keyword for describing a shooting object exists in the at least one target keyword, and a keyword for describing a shooting direction does not exist" in the at least one target keyword, "obtaining a target image from first image data" is described, where a difference is that the first image data in the foregoing implementation manner is replaced with second image data in the implementation manner, and details are not repeated here.
In another case, a keyword for indicating a shooting direction exists in the at least one target keyword obtained in step 202, since the reliability of the second direction directly input by the user through the voice information is higher than the reliability of the first direction, no matter whether the keyword for indicating the shooting object exists in the at least one target keyword, the vehicle may not perform step 204 any more, and the specific implementation manner of obtaining the target image from the first image data by the vehicle is described in the case where "if step 204 is not performed, and the keyword for indicating the shooting direction exists in the at least one target keyword obtained from the target voice through step 202", and the keyword for indicating the shooting direction does not exist in this case, which is not described again here.
A process of outputting a target image for a vehicle. After the vehicle acquires one or more target images, in one implementation, a display screen is configured in the vehicle, and the acquired target images can be directly displayed through the display screen. The display screen may be a central control screen of the vehicle, or a touch screen for receiving the first operation, and the specific selection of which display screen may be flexibly set in combination with the actual product form, which is not limited herein.
In another implementation manner, a wireless communication connection is pre-established between the vehicle and a vehicle carried by the user, and the vehicle can directly send the acquired target image to the vehicle so as to display the acquired target image through the vehicle carried by the user. The vehicle may also output the target image to the user in other manners, which are not limited herein.
Optionally, the target image may also carry shooting time of the target image.
For a more intuitive understanding of the present disclosure, please refer to fig. 7, and fig. 7 is a schematic interface diagram of an output target image in the image obtaining method according to the embodiment of the present disclosure. Fig. 7 includes two sub-schematic diagrams (a) and (b), where the sub-schematic diagram (a) of fig. 7 and the sub-schematic diagram (b) of fig. 7 both take the example of outputting 6 target images, the sub-schematic diagram (a) of fig. 7 takes the example of outputting the target images through a display screen of a vehicle, the sub-schematic diagram (b) of fig. 7 takes the example of outputting the target images through a vehicle carried by a user, and the sub-schematic diagram (b) of fig. 7 takes the example of displaying the target images in an application program such as "album", and it should be understood that the example in fig. 7 is only for convenience of understanding of the present solution and is not used to limit the present solution.
Secondly, the photographing instruction is a voice instruction
Specifically, referring to fig. 8, fig. 8 is a schematic flowchart of a method for acquiring an image according to an embodiment of the present application, where the method for acquiring an image according to the embodiment of the present application may include:
801. the vehicle captures an environment around the vehicle through a first camera device to acquire first image data.
In the embodiment of the present application, a specific implementation manner of step 801 is similar to that of step 201 in the embodiment corresponding to fig. 2, and can be directly understood by referring to the description, which is not repeated herein.
802. The vehicle acquires a preset gesture, and generates a photographing instruction in response to the acquired preset gesture.
In some embodiments of the application, the photographing instruction preset in the vehicle may be a gesture instruction, that is, the user may input the photographing instruction by making a preset gesture. Then, similar to step 202 in the corresponding embodiment of fig. 2, in one case, the vehicle may start to collect gesture information of the second user in real time after being started, and when it is determined that the second user inputs the preset gesture according to the gesture information of the second user, it is determined that the user inputs the photographing instruction. In another case, the vehicle starts the "vehicle auxiliary photographing" function based on the first operation input by the user, and the vehicle may start to acquire gesture information of the second user in real time after the user actively starts the "vehicle auxiliary photographing" function, so as to acquire the preset gesture input by the user.
The specific implementation manner of the first operation input by the user may refer to the description in the embodiment corresponding to fig. 2, which is not described herein again. The preset gesture may be a static gesture or a dynamic gesture. The second user may be any user in the vehicle, or may be only a passenger who is limited to a fixed position, for example, only a preset gesture may be input for a passenger who is located in a passenger seat, and it should be noted that the specific user selected by the second user as which users in the vehicle may be determined by combining with the actual product form, which is not limited herein.
Specifically, a process of collecting gesture information of a user for a vehicle. Whether the preset gesture is set as a static gesture, for example, the gesture pointed by the preset gesture is a fist-making gesture, a five-finger-opening gesture, or other static gestures, or the preset gesture is a dynamic gesture, for example, the gesture pointed by the preset gesture is a first fist-making gesture followed by a second five-finger-opening gesture. In one implementation mode, a second camera device may be configured inside the vehicle, the vehicle acquires a gesture image of a second user inside the vehicle in real time through the second camera device, the gesture image of the second user is analyzed through a computer vision algorithm, the gesture image of the second user is compared with an image corresponding to a preset gesture, and if the gesture image of the second user and the image corresponding to the preset gesture point to the same type of gesture, it is determined that the vehicle acquires the preset gesture input by the second user; and if the gesture image of the second user and the image corresponding to the preset gesture point to different types of gestures, determining that the vehicle does not acquire the preset gesture input by the second user.
In another implementation, the vehicle interior may be configured with sensors, which may be lasers, radars, or other types of sensors, etc., for gathering gesture information of the user. The vehicle can acquire the point cloud data corresponding to the gesture of the second user through the sensor, and then judge whether the gesture input by the user is a preset gesture or not according to the point cloud data corresponding to the gesture of the second user.
803. And the vehicle responds to the acquired photographing instruction and acquires a first moment corresponding to the photographing instruction.
In some embodiments of the application, after the vehicle acquires the photographing instruction, the vehicle needs to respond to the acquired photographing instruction to acquire a first moment corresponding to the photographing instruction.
Specifically, similar to the specific implementation manner of step 203 in the embodiment corresponding to fig. 2, in an implementation manner, the vehicle determines a third time corresponding to the photographing instruction in response to the obtained photographing instruction, where the third time is a generation time of the photographing instruction. And the vehicle acquires the first moment corresponding to the photographing instruction according to the third moment. For a specific implementation, reference may be made to the above description, which is not repeated herein.
In another implementation manner, the vehicle responds to the acquired photographing instruction, and acquires a second moment corresponding to the gesture instruction. The second moment is the shooting moment of the gesture image corresponding to the gesture instruction; further, if the preset gesture is a static gesture, the second time is a determined time. If the preset gesture is a dynamic gesture, the second time may be any one of the following times: the initial acquisition time of the preset gesture, the termination acquisition time of the preset gesture, any time in the acquisition process of the preset gesture, and the like, which are not limited herein.
And the vehicle determines the first moment according to the second moment. Further, if the preset gesture is a static gesture, the vehicle may directly determine the second time as the first time. If the preset gesture is a dynamic gesture, a time point before the second time and at which an interval duration between the preset gesture and the second time is a third time may also be determined as the first time, and a value of the third time may be the same as or different from a value of the second time.
In the embodiment of the application, when the preset gesture is a static gesture instruction, the shooting time of the static gesture can be directly acquired, the preset static gesture is taken as the first time, another acquisition mode of the first time is provided, and the implementation flexibility of the scheme is improved; in addition, since the user sees the object to be photographed, the time interval between when the user makes the preset static gesture is generally short, the photographing time of the preset static gesture is directly determined as the first time, and the first time is also relatively consistent with the photographing time actually desired by the user.
804. The vehicle acquires a first direction, and selects a target camera device with a shooting range covering the first direction from at least two first camera devices.
In the embodiment of the present application, a specific implementation manner of step 804 is similar to that of step 204 in the corresponding embodiment of fig. 2, and can be understood by referring to fig. 2 directly, which is not described herein again.
805. And the vehicle acquires a target image from the first image data according to the first moment and outputs the target image.
In this embodiment of the application, a specific implementation manner of step 805 may refer to the description in step 205 in the embodiment corresponding to fig. 2, and it should be noted that, in this embodiment, a preset gesture is input to trigger the vehicle to generate a photographing instruction, so the specific implementation manner of step 805 only includes several cases where the target keyword is not obtained from the target voice in step 202 in step 205.
In the embodiment of the application, the user can trigger the vehicle to generate the photographing instruction in a voice or gesture input mode, the operation is simple, and the implementation is easy.
In the embodiment of the application, the surrounding environment of the vehicle is shot through the camera device configured for the vehicle to obtain the first image data, when the shooting instruction input by the user is received, the first moment corresponding to the shooting instruction can be determined, and the target image with the shooting moment as the first moment is selected from the first image data, namely, the user is not required to shoot the image, so that the problem that the driver cannot shoot the image is solved, and the problem that the safety hazard exists in the shooting of the user is also solved; in addition, the first image data corresponds to the environment around the vehicle in the first time period, namely, the image pickup device configured for the vehicle is used for continuously shooting the environment around the vehicle, and then the image collected at the first time is selected, so that the condition that the scene to be shot is missed is avoided.
On the basis of the embodiments corresponding to fig. 1a to 8, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme. Specifically referring to fig. 9, fig. 9 is a schematic structural diagram of an image acquiring apparatus according to an embodiment of the present application, where the image acquiring apparatus 900 is applied to a vehicle, and the vehicle is provided with a camera apparatus, and the image acquiring apparatus 900 includes: a shooting module 901, configured to control a camera device to shoot an environment around a vehicle to obtain first image data, where the first image data corresponds to the environment around the vehicle in a first time period; an obtaining module 902, configured to obtain, in response to an obtained photographing instruction, a first time corresponding to the photographing instruction; the obtaining module 902 is further configured to obtain a target image from the first image data according to a first time, and output the target image, where the first time is included in a first time period, and an interval duration between a shooting time of the target image and the first time is less than or equal to a target threshold.
In a possible design, please refer to fig. 10, fig. 10 is a schematic structural diagram of an image capturing apparatus according to an embodiment of the present disclosure, in which the image capturing apparatus 900 further includes: a generating module 903, configured to generate a photographing instruction in response to the received target voice, where an intention corresponding to the target voice is to photograph; or the generating module 903 is configured to generate a photographing instruction in response to the acquired preset gesture.
In a possible design, the obtaining module 902 is further configured to obtain a target keyword from the target voice, where the target keyword is description information of a shooting object and/or a shooting direction; the obtaining module 902 is specifically configured to obtain a target image from the first image data according to the target keyword, where an object indicated by the target keyword exists in the target image, and/or a shooting direction of the target image is a direction pointed by the target keyword.
In one possible design, the preset gesture is a preset static gesture; the obtaining module 902 is specifically configured to obtain a second time corresponding to the preset gesture in response to the obtained shooting instruction, and determine the second time as the shooting time of the preset static gesture, where the second time is the first time.
In one possible design, the obtaining module 902 is specifically configured to: responding to the obtained photographing instruction, and determining a third moment corresponding to the photographing instruction, wherein the third moment is the generation moment of the photographing instruction; and determining the first time according to the third time, wherein the first time is before the third time.
In one possible design, referring to fig. 10, the vehicle is equipped with at least two cameras, the shooting ranges of the different cameras are not overlapped or partially overlapped; the obtaining module 902 is further configured to obtain a first direction, where the first direction is determined according to any one or more of the following: line of sight direction, facial orientation, and body orientation. The image acquisition apparatus 900 further includes: a selecting module 904, configured to select a target image capturing device with a capturing range covering a first direction from the at least two image capturing devices; the obtaining module 902 is specifically configured to select second image data from the first image data, where the second image data is acquired by the target camera device, and obtain a target image from the second image data.
It should be noted that, the contents of information interaction, execution process, and the like among the modules/units in the image obtaining apparatus 900 are based on the same concept as the method embodiments corresponding to fig. 1a to fig. 8 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not repeated herein.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a vehicle according to an embodiment of the present disclosure, wherein an image capturing device 900 described in the embodiment corresponding to fig. 9 may be disposed on the vehicle 1100 to implement functions of the vehicle according to the embodiments corresponding to fig. 1b to fig. 8. Specifically, the vehicle 1100 includes: a receiver 1101, a transmitter 1102, a processor 1103 and a memory 1104 (wherein the number of processors 1103 in the vehicle 1100 may be one or more, and one processor is taken as an example in fig. 11), wherein the processor 1103 may include an application processor 11031 and a communication processor 11032. In some embodiments of the application, the receiver 1101, the transmitter 1102, the processor 1103, and the memory 1104 may be connected by a bus or other means.
The memory 1104, which may include both read-only memory and random-access memory, provides instructions and data to the processor 1103. A portion of memory 1104 may also include non-volatile random access memory (NVRAM). The memory 1104 stores the processor and operating instructions, executable modules or data structures, or a subset or expanded set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
The processor 1103 controls the operation of the vehicle. In a particular application, the various components of the vehicle are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the embodiments of the present application can be applied to the processor 1103 or implemented by the processor 1103. The processor 1103 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method can be implemented by integrated logic circuits in hardware or instructions in software in the processor 1103. The processor 1103 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1103 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a RAM, flash memory, ROM, PROM, EPROM, register, or other storage medium known in the art. The storage medium is located in the memory 1104, and the processor 1103 reads the information in the memory 1104 and completes the steps of the method in combination with the hardware.
The receiver 1101 may be used to receive input numeric or character information and to generate signal inputs relating to relevant settings and functional control of the vehicle. The transmitter 1102 may be used to output numeric or character information through a first interface; the transmitter 1102 is also operable to send instructions to the disk groups via the first interface to modify data in the disk groups; the transmitter 1102 may also include a display device such as a display screen.
In this embodiment, the processor 1103 is configured to execute the method for acquiring an image performed by a vehicle in the embodiment corresponding to fig. 1b to 8. It should be noted that, a specific manner in which the application processor 11031 in the processor 1103 executes the above steps is based on the same concept as that of each method embodiment corresponding to fig. 1b to fig. 8 in the present application, and the technical effects brought by the method embodiment are the same as that of each method embodiment corresponding to fig. 1b to fig. 8 in the present application, and specific contents may refer to descriptions in the foregoing method embodiments in the present application, and are not repeated here.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a vehicle provided in an embodiment of the present application, when the vehicle 1100 is embodied as an autonomous vehicle, the vehicle 1100 is configured in a fully or partially autonomous driving mode, for example, the vehicle 1100 may control itself while in the autonomous driving mode, and may determine a current state of the vehicle and its surrounding environment by human operation, determine a possible behavior of at least one other vehicle in the surrounding environment, determine a confidence level corresponding to the possibility of the other vehicle performing the possible behavior, and control the vehicle 1100 based on the determined information. The vehicle 1100 may also be placed into operation without human interaction while the vehicle 1100 is in the autonomous driving mode.
The vehicle 1100 may include various subsystems, such as a travel system 102, a sensor system 104, a control system 106, one or more peripherals 108, as well as a power supply 110, a computer system 112, and a user interface 116. Alternatively, vehicle 1100 may include more or fewer subsystems, and each subsystem may include multiple components. In addition, each of the sub-systems and components of the vehicle 1100 may be interconnected by wires or wirelessly.
The travel system 102 may include components that provide powered motion for the vehicle 1100. In one embodiment, the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121.
The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, and a hybrid engine composed of an internal combustion engine and an air compression engine. The engine 118 converts the energy source 119 into mechanical energy. Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power. The energy source 119 may also provide energy to other systems of the vehicle 1100. The transmission 120 may transmit mechanical power from the engine 118 to the wheels 121. The transmission 120 may include a gearbox, a differential, and a driveshaft. In one embodiment, the transmission 120 may also include other devices, such as a clutch. Wherein the drive shaft may comprise one or more axles that may be coupled to one or more wheels 121.
The sensor system 104 may include several sensors that sense information about the environment surrounding the vehicle 1100, for obtaining point cloud data corresponding to the environment at each time, and images corresponding to the environment at each time. For example, the sensor system 104 may include a positioning system 122 (which may be a global positioning GPS system, a compass system, or other positioning system), an Inertial Measurement Unit (IMU) 124, a radar 126, a laser range finder 128, and a camera 130. The sensor system 104 may also include sensors of internal systems of the vehicle 1100 being monitored (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). The sensing data from one or more of these sensors can be used to detect the object and its corresponding characteristics (position, shape, orientation, velocity, etc.). Such detection and identification is a critical function of the safe operation of the autonomous vehicle 1100.
The positioning system 122 may be used, among other things, to estimate the geographic location of the vehicle 1100. The IMU 124 is used to sense position and orientation changes of the vehicle 1100 based on inertial acceleration. In one embodiment, IMU 124 may be a combination of an accelerometer and a gyroscope. The radar 126 may utilize radio signals to sense objects within the surrounding environment of the vehicle 1100, which may be embodied as a millimeter wave radar or a lidar. In some embodiments, in addition to sensing objects, radar 126 may also be used to sense the speed and/or heading of an object. Laser rangefinder 128 may utilize a laser to sense objects in the environment in which vehicle 1100 is located. In some embodiments, the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components. The camera 130 may be used to capture multiple images of the surrounding environment of the vehicle 1100. The camera 130 may be a still camera or a video camera.
The control system 106 is for controlling the operation of the vehicle 1100 and its components. The control system 106 may include various components including a steering system 132, a throttle 134, a braking unit 136, a computer vision system 140, a line control system 142, and an obstacle avoidance system 144.
Wherein steering system 132 is operable to adjust a forward direction of vehicle 1100. For example, in one embodiment, a steering wheel system. The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 1100. The brake unit 136 is used to control the deceleration of the vehicle 1100. The brake unit 136 may use friction to slow the wheel 121. In other embodiments, the brake unit 136 may convert the kinetic energy of the wheel 121 into an electric current. The brake unit 136 may also take other forms to slow the rotational speed of the wheel 121 to control the speed of the vehicle 1100. The computer vision system 140 may operate to process and analyze images captured by the camera 130 to identify objects and/or features in the environment surrounding the vehicle 1100. The objects and/or features may include traffic signals, road boundaries, and obstacles. The computer vision system 140 may use object recognition algorithms, Structure From Motion (SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map an environment, track objects, estimate the speed of objects, and so forth. The route control system 142 is used to determine a travel route and a travel speed of the vehicle 1100. In some embodiments, the route control system 142 may include a lateral planning module 1421 and a longitudinal planning module 1422, the lateral planning module 1421 and the longitudinal planning module 1422 being used to determine a travel route and a travel speed for the vehicle 1100 in conjunction with data from the obstacle avoidance system 144, the GPS 122, and one or more predetermined maps, respectively. Obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise negotiate obstacles in the environment of vehicle 1100, which may be embodied as actual obstacles and virtual moving objects that may collide with vehicle 1100. In one example, the control system 106 may additionally or alternatively include components other than those shown and described. Or may reduce some of the components shown above.
The vehicle 1100 interacts with external sensors, other vehicles, other computer systems, or users through the peripheral devices 108. The peripheral devices 108 may include a wireless communication system 146, an in-vehicle computer 148, a microphone 150, and/or speakers 152. In some embodiments, the peripheral device 108 provides a means for a user of the vehicle 1100 to interact with the user interface 116. For example, in-vehicle computer 148 may provide information to a user of vehicle 1100. The user interface 116 may also operate the in-vehicle computer 148 to receive user input. The in-vehicle computer 148 may be operated via a touch screen. In other cases, the peripheral device 108 may provide a means for the vehicle 1100 to communicate with other devices located within the vehicle. For example, microphone 150 may receive audio (e.g., voice commands or other audio input) from a user of vehicle 1100. Similarly, the speaker 152 may output audio to a user of the vehicle 1100. The wireless communication system 146 may include a receiver 1201 and a transmitter 1202 shown in fig. 12.
The power supply 110 may provide power to various components of the vehicle 1100. In one embodiment, power source 110 may be a rechargeable lithium ion or lead acid battery. One or more battery packs of such batteries may be configured as a power source to provide power to various components of the vehicle 1100. In some embodiments, the power source 110 and the energy source 119 may be implemented together, such as in some all-electric vehicles.
Some or all of the functionality of the vehicle 1100 is controlled by the computer system 112. The computer system 112 may include at least one processor 1103 and a memory 1104, and the description of the functions of the processor 1103 and the memory 1104 can refer to the description in fig. 12, which is not repeated herein.
The computer system 112 may control the functions of the vehicle 1100 based on inputs received from various subsystems (e.g., the travel system 102, the sensor system 104, and the control system 106) and from the user interface 116. For example, the computer system 112 may utilize input from the control system 106 in order to control the steering system 132 to avoid obstacles detected by the sensor system 104 and the obstacle avoidance system 144. In some embodiments, the computer system 112 is operable to provide control over many aspects of the vehicle 1100 and its subsystems.
Optionally, one or more of these components described above may be mounted or associated separately from the vehicle 1100. For example, the memory 1104 may exist partially or completely separate from the vehicle 1100. The above components may be communicatively coupled together in a wired and/or wireless manner.
Optionally, the above components are only an example, in an actual application, components in the above modules may be added or deleted according to an actual need, and fig. 12 should not be construed as a limitation to the embodiment of the present application. An autonomous vehicle traveling on a roadway, such as vehicle 1100 above, may identify objects within its surrounding environment to determine an adjustment to the current speed. The object may be another vehicle, a traffic control device, or another type of object. In some examples, each identified object may be considered independently, and based on the respective characteristics of the object, such as its current speed, acceleration, separation from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to be adjusted.
Alternatively, the vehicle 1100, or a computing device associated with the vehicle 1100, such as the computer system 112, computer vision system 140, memory 1104 of fig. 12, may predict the behavior of the identified object based on the characteristics of the identified object and the state of the surrounding environment (e.g., traffic, rain, ice on the road, etc.). Optionally, each of the identified objects is dependent on the behavior of each other, so all of the identified objects can also be considered together to predict the behavior of a single identified object. The vehicle 1100 is able to adjust its speed based on the predicted behavior of the identified object. In other words, the vehicle 1100 is able to determine what steady state the vehicle will need to adjust to (e.g., accelerate, decelerate, or stop) based on the predicted behavior of the object. Other factors may also be considered in this process to determine the speed of the vehicle 1100, such as the lateral position of the vehicle 1100 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and so forth. In addition to providing instructions to adjust the speed of the autonomous vehicle, the computing device may also provide instructions to modify the steering angle of the vehicle 1100 to cause the vehicle 1100 to follow a given trajectory and/or maintain a safe lateral and longitudinal distance from objects in the vicinity of the vehicle 1100 (e.g., cars in adjacent lanes on a road).
The vehicle 1100 may be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a trolley, a golf cart, a train, a cart, or the like, and the embodiment of the present invention is not particularly limited.
Embodiments of the present application also provide a computer program product, which when executed on a computer causes the computer to perform the steps performed by the vehicle in the method described in the foregoing embodiments shown in fig. 1b to 8.
In an embodiment of the present application, a computer-readable storage medium is further provided, in which a program for performing signal processing is stored, and when the program runs on a computer, the computer is enabled to perform the steps performed by the vehicle in the method described in the foregoing embodiments shown in fig. 1b to 8.
The image acquisition device provided by the embodiment of the application can be specifically a chip, and the chip comprises: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer-executable instructions stored in the storage unit to make the chip execute the image acquisition method described in the embodiments shown in fig. 1b to 8. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the radio access device, such as a read-only memory (ROM) or another type of static storage device that may store static information and instructions, a Random Access Memory (RAM), and the like.
Wherein any of the aforementioned processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits configured to control the execution of the programs of the method of the first aspect.
It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that a computer can store or a data storage device, such as a training device, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (12)

1. An image acquisition method, characterized in that the method is applied to a vehicle provided with a camera device, the method comprising:
controlling the camera device to shoot the environment around the vehicle to acquire first image data, wherein the first image data corresponds to the environment around the vehicle in a first time period;
responding to the obtained photographing instruction, and determining a third moment corresponding to the photographing instruction, wherein the third moment is the generation moment of the photographing instruction;
determining the first time according to the third time, wherein the first time is before the third time;
and acquiring a target image from the first image data according to the first time, wherein the first time is included in the first time period, and the interval duration between the shooting time of the target image and the first time is less than or equal to a target threshold.
2. The method of claim 1, further comprising:
responding to the received target voice, and generating the photographing instruction, wherein the intention corresponding to the target voice is photographing; alternatively, the first and second electrodes may be,
and responding to the acquired preset gesture, and generating the photographing instruction.
3. The method of claim 2, further comprising:
acquiring a target keyword from the target voice, wherein the target keyword comprises description information of a shooting object and/or a shooting direction;
the acquiring a target image from the first image data includes:
and acquiring a target image from the first image data according to the target keyword, wherein an object indicated by the target keyword exists in the target image, and/or the shooting direction of the target image is the direction indicated by the target keyword.
4. The method of claim 2, wherein the preset gesture comprises a preset static gesture, and the obtaining a first time corresponding to the photographing instruction in response to the obtained photographing instruction comprises:
and responding to the acquired photographing instruction, acquiring a second moment corresponding to the preset gesture, and determining the second moment as the first moment, wherein the second moment is the photographing moment of the preset static gesture.
5. The method according to claim 1 or 2, wherein the vehicle is provided with at least two camera devices, the shooting ranges of the different camera devices being non-overlapping or partially overlapping, the method further comprising:
acquiring a first direction, wherein the first direction is determined according to any one or more of the following items: gaze direction, facial orientation, and body orientation;
selecting a target camera device with a shooting range covering the first direction from at least two camera devices;
the acquiring a target image from the first image data includes:
acquiring second image data from the first image data, wherein the second image data is acquired by the target camera device;
the target image is acquired from the second image data.
6. An image acquisition apparatus, characterized in that the image acquisition apparatus is applied to a vehicle provided with an image pickup apparatus, the image acquisition apparatus comprising:
the shooting module is used for controlling the camera device to shoot the environment around the vehicle so as to obtain first image data, and the first image data corresponds to the environment around the vehicle in a first time period;
the acquisition module is used for responding to the acquired photographing instruction and acquiring a first moment corresponding to the photographing instruction;
the acquisition module is further configured to acquire a target image from the first image data according to the first time, where the first time is included in the first time period, and a time interval between a shooting time of the target image and the first time is less than or equal to a target threshold; wherein the content of the first and second substances,
the acquisition module is specifically configured to:
responding to the obtained photographing instruction, and determining a third moment corresponding to the photographing instruction, wherein the third moment is the generation moment of the photographing instruction;
and determining the first time according to the third time, wherein the first time is before the third time.
7. The apparatus of claim 6, further comprising:
the generating module is used for responding to the received target voice and generating the photographing instruction, and the intention corresponding to the target voice is photographing; alternatively, the first and second electrodes may be,
the generating module is used for responding to the acquired preset gesture and generating the photographing instruction.
8. The apparatus according to claim 7, wherein the obtaining module is further configured to obtain a target keyword from the target voice, where the target keyword includes description information of a shooting object and/or a shooting direction;
the obtaining module is specifically configured to obtain a target image from the first image data according to the target keyword, where an object indicated by the target keyword exists in the target image, and/or a shooting direction of the target image is a direction pointed by the target keyword.
9. The apparatus of claim 7, wherein the preset gesture comprises a preset static gesture;
the obtaining module is specifically configured to obtain a second time corresponding to the preset gesture in response to the obtained photographing instruction, and determine the second time as the first time, where the second time is the photographing time of the preset static gesture.
10. The device according to claim 6 or 7, characterized in that the vehicle is provided with at least two camera devices, the shooting ranges of the different camera devices do not overlap or partially overlap;
the obtaining module is further configured to obtain a first direction, where the first direction is determined according to any one or more of the following: gaze direction, facial orientation, and body orientation;
the device further comprises: the selecting module is used for acquiring a target camera device with a shooting range covering the first direction from at least two camera devices;
the acquisition module is specifically configured to select second image data from the first image data, where the second image data is acquired by the target camera device, and acquire the target image from the second image data.
11. A computer-readable storage medium, comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 5.
12. A vehicle comprising a processor and a memory, the processor coupled with the memory,
the memory is used for storing programs;
the processor to execute a program in the memory to cause the vehicle to perform the method of any of claims 1 to 5.
CN202180000814.3A 2021-03-30 2021-03-30 Image acquisition method and related equipment Active CN113228620B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083874 WO2022204925A1 (en) 2021-03-30 2021-03-30 Image obtaining method and related equipment

Publications (2)

Publication Number Publication Date
CN113228620A CN113228620A (en) 2021-08-06
CN113228620B true CN113228620B (en) 2022-07-22

Family

ID=77081270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180000814.3A Active CN113228620B (en) 2021-03-30 2021-03-30 Image acquisition method and related equipment

Country Status (2)

Country Link
CN (1) CN113228620B (en)
WO (1) WO2022204925A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113640013A (en) * 2021-08-12 2021-11-12 安徽江淮汽车集团股份有限公司 Road test data processing method for driving assistance
CN115802146B (en) * 2021-09-07 2024-04-02 荣耀终端有限公司 Method for capturing images in video and electronic equipment
CN114040107B (en) * 2021-11-19 2024-04-16 智己汽车科技有限公司 Intelligent automobile image shooting system, intelligent automobile image shooting method, intelligent automobile image shooting vehicle and intelligent automobile image shooting medium
CN114201225A (en) * 2021-12-14 2022-03-18 阿波罗智联(北京)科技有限公司 Method and device for awakening function of vehicle machine
CN114760417A (en) * 2022-04-25 2022-07-15 北京地平线信息技术有限公司 Image shooting method and device, electronic equipment and storage medium
US20240114236A1 (en) * 2022-09-29 2024-04-04 Samsung Electronics Co., Ltd. Apparatus and method for controlling a robot photographer with semantic intelligence
CN116300092A (en) * 2023-03-09 2023-06-23 北京百度网讯科技有限公司 Control method, device and equipment of intelligent glasses and storage medium
CN117041627B (en) * 2023-09-25 2024-03-19 宁波均联智行科技股份有限公司 Vlog video generation method and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN204559720U (en) * 2015-04-23 2015-08-12 宁波树袋熊汽车智能科技有限公司 A kind of device can taking landscape image in automobile is advanced
CN105869233A (en) * 2016-03-25 2016-08-17 奇瑞汽车股份有限公司 Travel recorder for realizing intelligent interaction, and control method thereof
CN106131413A (en) * 2016-07-19 2016-11-16 纳恩博(北京)科技有限公司 The control method of a kind of capture apparatus and capture apparatus
CN108375986A (en) * 2018-03-30 2018-08-07 深圳市道通智能航空技术有限公司 Control method, device and the terminal of unmanned plane
CN108712610A (en) * 2018-05-18 2018-10-26 北京京东尚科信息技术有限公司 Intelligent camera
CN111899518A (en) * 2020-07-13 2020-11-06 深圳市多威尔科技有限公司 Electric bicycle violation management system based on Internet of things
CN112182256A (en) * 2020-09-28 2021-01-05 长城汽车股份有限公司 Object identification method and device and vehicle
CN112544071A (en) * 2020-07-27 2021-03-23 华为技术有限公司 Video splicing method, device and system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017001532A (en) * 2015-06-10 2017-01-05 富士重工業株式会社 Vehicular travel control device
US10218837B1 (en) * 2018-02-12 2019-02-26 Benjamin J. Michael Dweck Systems and methods for preventing concurrent driving and use of a mobile phone
CN108495071A (en) * 2018-02-26 2018-09-04 浙江吉利汽车研究院有限公司 A kind of urgent image pickup method of automobile data recorder and system
CN110876011B (en) * 2018-08-30 2023-05-26 博泰车联网科技(上海)股份有限公司 Driving shooting method based on image recognition technology and vehicle
JP2020147066A (en) * 2019-03-11 2020-09-17 本田技研工業株式会社 Vehicle control system, vehicle control method, and program
CN110313174B (en) * 2019-05-15 2021-09-28 深圳市大疆创新科技有限公司 Shooting control method and device, control equipment and shooting equipment
CN112109729B (en) * 2019-06-19 2023-06-06 宝马股份公司 Man-machine interaction method, device and system for vehicle-mounted system
CN111277755B (en) * 2020-02-12 2021-12-07 广州小鹏汽车科技有限公司 Photographing control method and system and vehicle
CN111385475B (en) * 2020-03-11 2021-09-10 Oppo广东移动通信有限公司 Image acquisition method, photographing device, electronic equipment and readable storage medium
CN112513787B (en) * 2020-07-03 2022-04-12 华为技术有限公司 Interaction method, electronic device and system for in-vehicle isolation gesture
CN112380922B (en) * 2020-10-23 2024-03-22 岭东核电有限公司 Method, device, computer equipment and storage medium for determining multiple video frames

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN204559720U (en) * 2015-04-23 2015-08-12 宁波树袋熊汽车智能科技有限公司 A kind of device can taking landscape image in automobile is advanced
CN105869233A (en) * 2016-03-25 2016-08-17 奇瑞汽车股份有限公司 Travel recorder for realizing intelligent interaction, and control method thereof
CN106131413A (en) * 2016-07-19 2016-11-16 纳恩博(北京)科技有限公司 The control method of a kind of capture apparatus and capture apparatus
CN108375986A (en) * 2018-03-30 2018-08-07 深圳市道通智能航空技术有限公司 Control method, device and the terminal of unmanned plane
CN108712610A (en) * 2018-05-18 2018-10-26 北京京东尚科信息技术有限公司 Intelligent camera
CN111899518A (en) * 2020-07-13 2020-11-06 深圳市多威尔科技有限公司 Electric bicycle violation management system based on Internet of things
CN112544071A (en) * 2020-07-27 2021-03-23 华为技术有限公司 Video splicing method, device and system
CN112182256A (en) * 2020-09-28 2021-01-05 长城汽车股份有限公司 Object identification method and device and vehicle

Also Published As

Publication number Publication date
WO2022204925A1 (en) 2022-10-06
CN113228620A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN113228620B (en) Image acquisition method and related equipment
CN107776574B (en) Driving mode switching method and device for automatic driving vehicle
WO2021052213A1 (en) Method and device for adjusting accelerator pedal characteristic
WO2022000448A1 (en) In-vehicle air gesture interaction method, electronic device, and system
US20200017123A1 (en) Drive mode switch controller, method, and program
EP4134949A1 (en) In-vehicle user positioning method, on-board interaction method, on-board device, and vehicle
CN113631452B (en) Lane change area acquisition method and device
US20230046258A1 (en) Method and apparatus for identifying object of interest of user
EP4137914A1 (en) Air gesture-based control method and apparatus, and system
US20220224860A1 (en) Method for Presenting Face In Video Call, Video Call Apparatus, and Vehicle
EP4207731A1 (en) Method and apparatus for controlling light supplementing time of camera module
CN112810603A (en) Positioning method and related product
JP2022047580A (en) Information processing device
CN115170630B (en) Map generation method, map generation device, electronic equipment, vehicle and storage medium
CN114771539B (en) Vehicle lane change decision method and device, storage medium and vehicle
WO2022061702A1 (en) Method, apparatus, and system for driving alerts
CN114880408A (en) Scene construction method, device, medium and chip
CN114973178A (en) Model training method, object recognition method, device, vehicle and storage medium
CN115675504A (en) Vehicle warning method and related equipment
CN114572219B (en) Automatic overtaking method and device, vehicle, storage medium and chip
CN114842454B (en) Obstacle detection method, device, equipment, storage medium, chip and vehicle
CN115115822B (en) Vehicle-end image processing method and device, vehicle, storage medium and chip
CN108528331A (en) Vehicle, ultrasonic system and device and its information interacting method
CN115164910B (en) Travel route generation method, travel route generation device, vehicle, storage medium, and chip
CN115223122A (en) Method and device for determining three-dimensional information of object, vehicle and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant