WO2024099238A1 - 辅助语音导航方法、装置、电子设备及存储介质 - Google Patents

辅助语音导航方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024099238A1
WO2024099238A1 PCT/CN2023/129805 CN2023129805W WO2024099238A1 WO 2024099238 A1 WO2024099238 A1 WO 2024099238A1 CN 2023129805 W CN2023129805 W CN 2023129805W WO 2024099238 A1 WO2024099238 A1 WO 2024099238A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
visual positioning
target
positioning model
voice
Prior art date
Application number
PCT/CN2023/129805
Other languages
English (en)
French (fr)
Inventor
张健龙
王明远
罗莉舒
傅依
龙超
王丽月
胡佩涛
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024099238A1 publication Critical patent/WO2024099238A1/zh

Links

Definitions

  • the embodiments of the present disclosure relate to the technical field of intelligent terminals, and in particular to an auxiliary voice navigation method, device, electronic device and storage medium.
  • the embodiments of the present disclosure provide an auxiliary voice navigation method, device, electronic device and storage medium to overcome the problem that the perception range of smart terminal devices is limited and long-distance target navigation cannot be achieved.
  • an embodiment of the present disclosure provides an assisted voice navigation method, comprising:
  • the following steps are executed in a loop: acquiring a first image, and determining a target path according to the first image and a visual positioning model, wherein the visual positioning model represents the position distribution of objects within the first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position where the target object is located; playing a navigation voice corresponding to the current position according to the target path, and the navigation voice corresponding to the current position is played; The navigation voice represents the moving direction and the corresponding moving distance.
  • an auxiliary voice navigation device including:
  • the interaction module is configured to cyclically call the following modules in response to a first instruction indicating a target object within a first range:
  • a processing module configured to acquire a first image, and determine a target path according to the first image and a visual positioning model, wherein the visual positioning model represents the position distribution of objects within the first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position of the target object;
  • the playing module is used to play the navigation voice corresponding to the current position according to the target path, and the navigation voice represents the moving direction and the corresponding moving distance.
  • an electronic device including:
  • a processor and a memory communicatively connected to the processor
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the assisted voice navigation method as described in the first aspect and various possible designs of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the assisted voice navigation method as described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the assisted voice navigation method as described in the first aspect and various possible designs of the first aspect.
  • the auxiliary voice navigation method, device, electronic device and storage medium in response to a first instruction indicating a target object within a first range, loops through the following steps: acquiring a first image, and determining a target path based on the first image and a visual positioning model, wherein the visual positioning model characterizes the position distribution of objects within the first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position of the target object; and playing a navigation voice corresponding to the current position based on the target path, wherein the navigation voice characterizes the moving direction and the corresponding moving distance.
  • a moving path from the current position to the position of the target object is determined, and converted into
  • users can reach the location of target objects outside the field of view of image acquisition according to the played voice prompts, thereby improving the perception and navigation range of terminal devices and realizing beyond-line-of-sight, long-distance target navigation.
  • FIG1 is a diagram of an application scenario of the auxiliary voice navigation method provided by an embodiment of the present disclosure
  • FIG2 is a flow chart of a method for assisting voice navigation according to an embodiment of the present disclosure
  • FIG3 is a flowchart of specific implementation steps of step S102 in the embodiment shown in FIG2 ;
  • FIG4 is a schematic diagram of a process of generating a target path provided by an embodiment of the present disclosure
  • FIG5 is a second flow chart of the auxiliary voice navigation method provided by an embodiment of the present disclosure.
  • FIG6 is a schematic diagram of a process of playing directional speech provided by an embodiment of the present disclosure.
  • FIG7 is a third flow chart of the auxiliary voice navigation method provided by an embodiment of the present disclosure.
  • FIG8 is a flowchart of specific implementation steps of step S302 in the embodiment shown in FIG7 ;
  • FIG9 is a flowchart of specific implementation steps of step S303 in the embodiment shown in FIG7 ;
  • FIG10 is a flowchart of specific implementation steps of step S3033 in the embodiment shown in FIG9 ;
  • FIG11 is a structural block diagram of an auxiliary voice navigation device provided by an embodiment of the present disclosure.
  • FIG12 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 13 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • FIG1 is an application scenario diagram of the auxiliary voice navigation method provided by the embodiment of the present disclosure.
  • the auxiliary voice navigation method provided by the embodiment of the present disclosure can be applied to the voice travel navigation scenario for visually impaired users, more specifically, for example, to the application scenario of indoor target object navigation for visually impaired users.
  • the method provided by the embodiment of the present disclosure can be applied to terminal devices, such as smart phones, wearable devices, etc.
  • the terminal device is connected to the cloud service communication and exchanges data with the cloud server.
  • the terminal device collects the environment image and converts the environment image into the corresponding navigation voice for broadcasting.
  • the content of the navigation voice is "go straight forward 10 meters".
  • the visually impaired user can walk according to the voice broadcast and finally reach the location of the target object, thereby realizing the target object navigation based on auxiliary voice.
  • the application scenario of the indoor target object navigation is, for example, a scene of finding a specific book in a library and a scene of finding a specific item in a supermarket.
  • handheld intelligent terminal devices are used to collect images of the surrounding environment to achieve environmental perception, and the results of environmental perception are converted into voice for broadcasting, so that users in the visually impaired group can determine the situation of the surrounding environment based on the content of the voice broadcast.
  • the above scheme is based on the recognition of the real-time collected environmental image, and the generated voice is broadcast after conversion, but objects outside the environmental image cannot be perceived. Therefore, the voice generated by the above scheme can only achieve a general prompting effect, but cannot perceive and broadcast objects outside the environmental image, nor can it achieve navigation for objects outside the environmental image.
  • the embodiments of the present disclosure provide an auxiliary voice navigation method to solve the above problems.
  • FIG. 2 is a flow chart of an auxiliary voice navigation method provided by an embodiment of the present disclosure.
  • the method of this embodiment can be applied in a terminal device, and the auxiliary voice navigation method includes:
  • Step S101 receiving a first instruction input by a user, where the first instruction is used to represent a target object within a first range.
  • the execution subject in this embodiment is a terminal device, such as a smart wearable device.
  • the first instruction is a voice instruction issued by the user, and the terminal device detects voice signals at a preset frequency.
  • the terminal device can, for example, first detect the awakening voice at a low sampling rate, and the content of the awakening voice is, for example, "Hello, Xiao A".
  • the command voice issued by the user is detected at a high sampling rate, for example.
  • the terminal device obtains the corresponding first instruction, i.e., information indicating "fruit shelf", by recognizing the command voice; in another possible implementation, the first instruction is generated based on the user's gesture operation or case operation on the terminal device.
  • the terminal device is provided with a Button_1 button, which can be a program button or a physical button. After the user triggers the Button_1 button, the terminal device generates a corresponding first instruction, which corresponds to a preset target object, such as "room door”, i.e., the first instruction is information representing the "room door".
  • Step S102 Acquire a first image, and determine a target path based on the first image and a visual positioning model, wherein the visual positioning model characterizes the position distribution of objects within a first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position of the target object.
  • the terminal device obtains an image in the current environment, i.e., a first image, through an image acquisition unit provided on the terminal device.
  • the first image may be a frame of a picture taken by the image acquisition unit, or a mosaic or superimposed image of multiple frames of pictures taken by the image acquisition unit.
  • a mosaic image refers to stitching together multiple frames of pictures based on the image field of view of the multiple frames of pictures taken to form a picture with a larger image field of view
  • a superimposed image refers to superimposing multiple frames of pictures with the same or similar image fields of view to obtain a picture with higher contrast and clarity.
  • the specific implementation method of stitching and superimposing multiple frames of pictures to obtain a mosaic image and a superimposed image will not be repeated here.
  • Example action after obtaining the first image, the first image is processed by the visual positioning model to obtain a moving path from the current position corresponding to the first image to the position of the target object, that is, the target path.
  • the visual positioning model is a model that characterizes the position distribution of objects within the first range in the three-dimensional simulation space.
  • the three-dimensional simulation space is a simulation of the real environment of the first range
  • the visual positioning model is a model that describes the three-dimensional simulation space.
  • the visual positioning model can be regarded as three-dimensional map data for the first range.
  • the first range corresponds to an indoor range Zoom_1 of a supermarket
  • the three-dimensional simulation space is a virtual space that characterizes the environment and objects within the indoor range Zoom_1 of the supermarket.
  • the three-dimensional simulation space includes, for example, shelves, goods and roads in the supermarket; and the visual positioning model is a description of the three-dimensional simulation space, which includes, for example: the identification, volume, position and other information of the shelves, goods and roads in the supermarket.
  • the visual positioning model can be implemented by a three-dimensional pixel matrix and corresponding item labels, or by a configuration table to describe the label, position, volume and other information corresponding to each object.
  • the specific implementation method of the positioning model can be set as needed, and examples will not be given here one by one.
  • the visual positioning model can be a model deployed locally on the terminal device, or a model deployed on a cloud server that communicates with the terminal device.
  • the visual positioning model can be a visual positioning service (Visual Positioning Service, VPS) deployed in a cloud server that communicates with the terminal device.
  • VPS Visual Positioning Service
  • the visual positioning model is searched through the first image and the target object respectively, so as to obtain the position corresponding to the first image and the position corresponding to the target object, and then combined with the preset navigation algorithm to generate the target path.
  • step S102 include:
  • Step S1021 input the first image into the visual positioning model to determine a first spatial position, where the first spatial position represents a mapping of an image capture point when the first image is captured in a three-dimensional simulation space.
  • Step S1022 Search the visual positioning model to obtain a second spatial position corresponding to the target object, where the second spatial position represents a mapping of the target position where the target object is located in the three-dimensional simulation space.
  • Step S1023 Generate a target path based on the first spatial position and the second spatial position.
  • the first image is input into the visual positioning model for comparative search to determine the position of the virtual environment area in the three-dimensional simulation space that is consistent or similar to the area depicted by the first image, that is, the first spatial position, that is, the current position of the (terminal device); in short, the first spatial position is the mapping of the real environment area depicted by the first image in the three-dimensional simulation space, and the first spatial position is expressed based on the visual positioning model, that is, it is represented by the coordinate system in the three-dimensional simulation space represented by the visual positioning model.
  • the object identification corresponding to the target object is obtained.
  • the target object identified based on the first instruction is a "fruit cargo shelf", and the corresponding object identification is "#0021".
  • a search is performed in the visual positioning model based on the object identification to obtain the position coordinates of the target object "fruit cargo shelf", that is, the second spatial position.
  • the second spatial position is also expressed based on the visual positioning model, that is, it is represented by the coordinate system in the three-dimensional simulation space represented by the visual positioning model.
  • a navigation path from the first spatial position to the second spatial position i.e., the target path
  • the algorithm for path planning based on map data (visual positioning model) and the starting point (first spatial position) and the target point (second spatial position) is a prior art known to those skilled in the art and will not be described in detail here.
  • Figure 4 is a schematic diagram of a process for generating a target path provided by an embodiment of the present disclosure.
  • the first image Pic_1 and the object identification Ob_01 of the target object are respectively input into the visual positioning model.
  • the visual positioning model recognizes the image content in the first image Pic_1, determines the mapping area of the image content in the three-dimensional simulation space, and then determines the positioning point P1 of the image shooting point corresponding to the first image Pic_1 in the three-dimensional simulation space based on the mapping area;
  • the visual positioning model searches based on the object identification Ob_01 to obtain the positioning point P2 corresponding to the object identification Ob_01, and then inputs the positioning point P2 and the positioning point P2 into the navigation planning algorithm to generate the target path, wherein the navigation planning algorithm may be a capability provided by the visual positioning model.
  • Step S103 Play the navigation voice corresponding to the current position according to the target path, where the navigation voice represents the moving direction and the corresponding moving distance.
  • the moving direction and moving distance corresponding to the movement along the target path are determined according to the current position of the terminal device, that is, the first spatial position obtained in the previous step, for example, the moving direction is "north" and the moving distance is “10 meters”. Then, based on the preset voice generation template, the above moving direction and moving distance information are converted into corresponding navigation voices, for example, "Move 10 meters north”.
  • the terminal device in order to allow visually impaired users to determine the moving direction, can convert the absolute direction into relative directions such as "left" and "right”.
  • the specific conversion method includes, for example: identifying through the first image and the visual positioning model, determining the direction the current user is facing, thereby realizing the conversion from absolute direction to relative direction. Afterwards, by playing the navigation voice, the user is guided to move along the target path from the current position, and finally reach the target position where the target object is located, thereby achieving the purpose of navigation of the target object.
  • Step S104 If the current position reaches the target position, the loop ends; if the current position does not reach the target position, the loop returns to step S102.
  • the latest current position can be obtained based on the current position obtained in the previous step, or an additional position measurement can be performed, and the visual positioning model can be used to detect whether the current position coincides with the target position. If the two coincide, it means that the user (terminal device) has arrived at the destination, and the navigation process is ended; if the two do not coincide, return to step S102, re-acquire the real-time first image, and repeat the above steps to continue voice navigation until the target position is reached.
  • the following steps are executed cyclically: acquiring a first image, and determining a target path according to the first image and a visual positioning model, wherein the visual positioning model represents the position of the object within the first range in the three-dimensional simulation space;
  • the target path is the moving path from the current position corresponding to the first image to the position of the target object; according to the target path, the navigation voice corresponding to the current position is played, and the navigation voice represents the moving direction and the corresponding moving distance.
  • the visual positioning model is used to characterize the position distribution of objects within the first range in the three-dimensional simulation space, and the moving path from the current position to the position of the target object is determined, and converted into voice for playback, so that the user can reach the position of the target object outside the image acquisition field of view according to the played voice prompt, thereby improving the perception and navigation range of the terminal device and realizing beyond-visual-range and long-distance target navigation.
  • FIG5 is a second flow chart of the auxiliary voice navigation method provided by an embodiment of the present disclosure. Based on the embodiment shown in FIG2 , this embodiment adds a step of indicating the direction of the second spatial position, and the auxiliary voice navigation method includes:
  • Step S201 receiving a first instruction input by a user, where the first instruction is used to represent a target object within a first range.
  • Step S202 Acquire a first image, and determine a target path based on the first image and a visual positioning model, wherein the visual positioning model characterizes the position distribution of objects within a first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position of the target object.
  • Step S203 Obtain the path distance between the first spatial position and the second spatial position according to the target path.
  • Step S204 When the path distance is greater than the first preset distance, play the navigation voice corresponding to the current position, the navigation voice represents the moving direction and the corresponding moving distance, and return to step S202.
  • Step S205 when the path distance is less than the first preset distance, obtaining orientation information, the orientation information representing the spatial orientation of the second spatial position relative to the first spatial position.
  • Step S206 Play the directional voice corresponding to the directional information.
  • this embodiment further adds a step of playing a directional voice when it is determined that the path distance is less than the first preset distance, thereby achieving accurate voice instructions for the target object.
  • the path distance between the first spatial position and the second spatial position is calculated, wherein the first spatial position represents the current position of the terminal device, and the second spatial position represents the current position of the terminal device.
  • the position represents the target position of the target object, and the path distance between the first spatial position and the second spatial position is the distance between the user (terminal device) and the target position where the target object is located.
  • the size of the virtual object in the three-dimensional simulation space represented by the visual positioning model and the distance between the virtual objects are set based on the size of the object in the first range in the real environment and the distance between the objects, for example, set at a ratio of 1 to 1.
  • a value representing the path distance between the current position and the target position can be obtained.
  • the path distance judgment when the path distance is less than or equal to the first preset distance, for example, 1 meter, it means that the user is very close to the target object and can be considered to have reached the target position.
  • the first image or other reference information can be used for identification to obtain the spatial orientation representing the second spatial position (target position) relative to the first spatial position (current position).
  • the orientation information can be an angle value with a direction mark, such as 30 degrees in front and 20 degrees on the left.
  • the orientation information is converted and the orientation voice is generated for broadcasting, so that the user can further determine the orientation relationship between the target position and the current position of the target object, and accurately locate the target object.
  • the path distance is greater than the first preset distance, it means that the target object is still far away and there is no need to determine the direction of the target object. Therefore, the navigation voice corresponding to the current position is played.
  • the specific implementation process has been introduced in the embodiment shown in Figure 2 and will not be repeated here.
  • step S203 the method further includes:
  • Step S207 determining corresponding vibration parameters according to the path distance, where the vibration parameters represent the vibration frequency and/or vibration amplitude.
  • Step S208 controlling the vibration unit to vibrate based on the vibration parameters.
  • a vibration unit for generating vibrations by a user is provided on the terminal device, wherein the vibration frequency and/or vibration amplitude of the vibration emitted by the vibration unit is related to the path distance.
  • the corresponding vibration parameters are set based on the road section distance, and the smaller the path distance, the larger the vibration amplitude and/or vibration amplitude; alternatively, when the path distance is less than a first preset distance, the vibration unit is activated, or the vibration frequency and/or vibration amplitude is increased.
  • the visually impaired user may still be moving during the position voice broadcast (i.e. when reaching the target position), resulting in the appearance of "passing by".
  • the user's current actual position does not match the current position corresponding to the position information indicated by the position voice, which in turn causes the visually impaired user to lose control of the position according to the position voice.
  • the user goes to the direction indicated by the voice to pick up the item, he or she cannot get the target item.
  • the vibration prompt has good real-time performance and can be continuously changed, so that the visually impaired user can predict whether the target location will be reached based on the vibration characteristics (vibration frequency and/or vibration amplitude) of the vibration unit that continuously changes, and when the target location is reached (when the path distance is less than the first preset distance), the vibration characteristics of the vibration unit are controlled to change, so that the user receives the targeted instruction in the first time and stops moving, and then combines the directional voice to accurately pick up the target object.
  • vibration characteristics vibration frequency and/or vibration amplitude
  • FIG6 is a schematic diagram of a process of playing directional voice provided by an embodiment of the present disclosure.
  • the terminal device is, for example, a smart phone, corresponding to an application scenario of indoor navigation in a supermarket.
  • the target object is, for example, a "fruit shelf”.
  • the terminal device determines the first spatial position by real-time acquisition of the first image, calculates the path distance between the first spatial position and the second spatial position corresponding to the target position, and adjusts the vibration amplitude of the vibration unit based on the path distance. The shorter the path distance, the greater the vibration amplitude.
  • the vibration amplitude of the vibration emitted by the vibration unit is p millimeters/second (mm/s).
  • the vibration amplitude of the vibration emitted by the vibration unit is 2p millimeters/second.
  • the amplitude of the vibration emitted by the vibration unit changes continuously, but the vibration frequencies of the vibration units corresponding to positions A and B are consistent, both of which are f Hertz (Hz); when the user (terminal device) reaches position C corresponding to the target position (the path distance is less than the first preset distance), the vibration amplitude of the vibration emitted by the vibration unit is 3p millimeters/second, and the vibration frequency becomes 2f Hertz, and the vibration frequency changes suddenly, thereby prompting the user to reach the target position and stop moving.
  • f Hertz Hz
  • the terminal device generates and plays the azimuth voice based on the azimuth information calculated from the same frame of the first image, to indicate the position of the target object, so that the user can accurately pick up the target object based on the guidance of the azimuth voice.
  • steps S201-S202 are consistent with steps S101-S102 in the embodiment shown in FIG. 2 .
  • steps S101-S102 please refer to the discussion of steps S101-S102 , which will not be repeated here.
  • FIG. 7 is a flow chart of the third embodiment of the assisted voice navigation method provided by the embodiment of the present disclosure. Based on the embodiment shown in FIG. 2 , this embodiment adds a step of updating the visual positioning model.
  • the assisted voice navigation method includes:
  • Step S301 receiving a first instruction input by a user, where the first instruction is used to represent a target object within a first range.
  • Step S302 Acquire a first image, and set an update frequency of a visual positioning model according to the first image, wherein the visual positioning model represents position distribution of objects within a first range in a three-dimensional simulation space.
  • the visual positioning model is a model that characterizes the position distribution of objects within the first range in the three-dimensional simulation space.
  • the visual positioning model needs to be updated synchronously to ensure the accuracy of the visual positioning model, thereby ensuring the accuracy of the target path generated based on the visual positioning model, and avoiding the problem that the generated target path causes collisions with visually impaired users due to the failure to update the visual positioning model in time. Then, since there are many target objects involved in the visual positioning model and the amount of data is large, especially when the first range is large, frequent updates of the visual positioning model will lead to unnecessary overhead and waste of resources.
  • the update frequency of the corresponding visual positioning model is determined by detecting the change of the first image.
  • the first image changes greatly it means that the objects in the current environment, that is, the objects within the first range, change frequently.
  • a higher update frequency is set for the visual positioning model to improve the accuracy of the visual positioning model; otherwise, a lower update frequency is set for the visual positioning model to reduce the consumption of various resources.
  • step S302 include:
  • Step S3021 Acquire a second image, where the second image is the first N frames of the first image, where N is an integer greater than 0.
  • Step S3022 determining image difference information according to the first image and the second image, where the image difference information represents a displacement of a reference object in the second image relative to a reference object in the first image.
  • Step S3023 Setting the update frequency of the visual positioning model according to the image difference information.
  • the first image collected for the most recent N times is saved as a historical environment picture.
  • the first N frames of the first image are extracted, which are the second image, and N is an integer greater than 0, such as 30. That is, the first image currently collected in real time is compared with the first image (second image) collected 30 frames ago to obtain the corresponding image difference information representing the displacement of the reference object in the second image relative to the reference object in the first image.
  • the reference object in the second image is the same object as the reference object in the first image, such as a pedestrian or a vehicle.
  • Step S303 Update the visual positioning model based on the update frequency.
  • the visual positioning model is updated based on the update frequency, for example, every 30 frames or every minute.
  • the visual positioning model corresponds to multiple spatial regions.
  • the data corresponding to all spatial regions in the visual positioning model can be updated based on the update frequency, or only the data corresponding to the spatial region corresponding to the current position (first spatial position) can be updated, thereby improving resource utilization.
  • step S303 includes:
  • Step S3031 Acquire a region identifier corresponding to the target object, where the region identifier represents an image acquisition region within the first range;
  • Step S3032 Based on the region identifier corresponding to the target object, calling the corresponding image acquisition device to acquire an image to obtain a second image;
  • Step S3033 Update the visual positioning model according to the second image.
  • the terminal device is directly or indirectly connected to the image acquisition device, and the image acquisition device is, for example, a distributed intelligent camera based on the Internet of Things.
  • the image acquisition device communicates with the terminal device or communicates with the cloud server to receive the image acquisition instruction sent from the terminal device or the cloud server to perform image acquisition.
  • the distributed intelligent cameras correspond to an image acquisition area respectively, and the visual positioning model is updated by acquiring the image of the image acquisition area.
  • the target object is an object with mobility, such as a service robot in a library or a supermarket. Therefore, the position of the target object will change randomly.
  • the terminal device determines the image acquisition area corresponding to the target object by querying the visual positioning model, and obtains the area identifier corresponding to the target object. After that, the image acquisition device corresponding to the area identifier is called, and based on the update frequency determined in the previous step, a second image is acquired, and the visual positioning model is updated based on the second image, so that the position information of the target object stored in the visual positioning model is more accurate and real-time.
  • step S3033 include:
  • Step S3033A Perform image recognition on the second image to determine the current position of the target object.
  • Step S3033B Update the visual positioning model based on the current position of the target object.
  • the region identifier corresponding to the target object is obtained, and the region identifier is called based on the region identifier.
  • the corresponding distributed image acquisition devices perform regional image acquisition to achieve directional updates for dynamic target objects, thereby ensuring that the generated target path is accurate and reasonable while avoiding the problem of waste of resources caused by excessive updating of the visual positioning model.
  • Step S304 determining a target path according to the first image and the visual positioning model, wherein the target path is a moving path from a current position corresponding to the first image to a position where the target object is located.
  • Step S305 Play the navigation voice corresponding to the current location according to the target path.
  • Step S306 If the current position reaches the target position, the loop ends; if the current position does not reach the target position, the loop returns to step S302.
  • steps S301, S304, and S305 are consistent with steps S101-S103 in the embodiment shown in Figure 2.
  • steps S101-S103 in the embodiment shown in Figure 2, which will not be repeated here.
  • auxiliary voice navigation method provided in this embodiment can also be implemented on the basis of the embodiment shown in Figure 5, that is, on the basis of this embodiment, combined with the technical features of setting the vibration unit based on the path distance in the embodiment shown in Figure 5 (steps S203-S208), the purpose of vibration unit control and directional voice playback is achieved, which will not be repeated here.
  • FIG11 is a structural block diagram of an auxiliary voice navigation device provided by an embodiment of the present disclosure.
  • the auxiliary voice navigation device 4 includes:
  • the interaction module 41 is configured to cyclically call the following modules in response to a first instruction indicating a target object within a first range:
  • a processing module 42 is used to obtain a first image, and determine a target path according to the first image and a visual positioning model, wherein the visual positioning model represents the position distribution of objects within the first range in the three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position where the target object is located;
  • the playing module 43 is used to play the navigation voice corresponding to the current position according to the target path, and the navigation voice represents the moving direction and the corresponding moving distance.
  • the processing module 42 when determining the target path according to the first image and the visual positioning model, is specifically used to: input the first image into the visual positioning model to determine the first spatial position, the first spatial position represents the mapping of the image shooting point when the first image is shot in the three-dimensional simulation space; search the visual positioning model to obtain the second spatial position corresponding to the target object, the second spatial position The position represents a mapping of a target position where the target object is located in the three-dimensional simulation space; and a target path is generated based on the first spatial position and the second spatial position.
  • the processing module 42 is also used to: obtain the path distance between the first spatial position and the second spatial position based on the target path; when the path distance is less than the first preset distance, obtain the orientation information, the orientation information represents the spatial orientation of the second spatial position relative to the first spatial position; the playback module 43 is also used to: play the orientation voice corresponding to the orientation information.
  • the processing module 42 is further used to: determine corresponding vibration parameters according to the path distance, where the vibration parameters represent the vibration frequency and/or vibration amplitude; and control the vibration of the vibration unit based on the vibration parameters.
  • the processing module 42 is further used to: acquire a second image, where the second image is the first N frames of the first image, where N is an integer greater than 0; determine image difference information based on the first image and the second image, where the image difference information represents the displacement of a reference object in the second image relative to a reference object in the first image; set an update frequency of the visual positioning model based on the image difference information; and update the visual positioning model based on the update frequency.
  • the processing module 42 is also used to: obtain an area identifier corresponding to the target object, the area identifier represents an image acquisition area within the first range; based on the area identifier corresponding to the target object, call a corresponding image acquisition device to perform image acquisition to obtain a second image; and update the visual positioning model according to the second image.
  • the processing module 42 when the processing module 42 updates the visual positioning model according to the second image, it is specifically used to: perform image recognition on the second image to determine the current position of the target object; and update the visual positioning model based on the current position of the current position.
  • the interactive module 41, the processing module 42 and the playing module 43 are connected in sequence.
  • the auxiliary voice navigation device 4 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • FIG12 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG12 , the electronic device 5 includes:
  • the memory 52 stores computer executable instructions
  • the processor 51 executes the computer-executable instructions stored in the memory 52 to implement the assisted voice navigation method in the embodiments shown in Figures 2 to 10.
  • processor 51 and the memory 52 are connected via a bus 53 .
  • FIG. 13 it shows a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present disclosure
  • the electronic device 900 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs Portable Android Devices
  • PMPs portable multimedia players
  • vehicle terminals such as vehicle navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 13 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
  • the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903.
  • a processing device e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903.
  • the processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904.
  • the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 907 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 908 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 909.
  • the communication device 909 may allow the electronic device 900 to communicate with other devices wirelessly or by wire to exchange data.
  • FIG. 13 shows an electronic device 900 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be transmitted through a communication
  • the device 909 is downloaded and installed from the network, or installed from the storage device 908, or installed from the ROM 902.
  • the processing device 901 the above functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
  • This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • the computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a stand-alone software package, partially on the user's computer, or partially on the user's computer.
  • the program may be executed partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include electrical connections based on one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable and programmable memory, or a computer programmable storage medium.
  • the invention may be a computer program product comprising: a program read-only memory (EPROM or flash memory), optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • an assisted voice navigation method comprising:
  • a first image is acquired, and a target path is determined based on the first image and a visual positioning model, wherein the visual positioning model represents the position distribution of objects within the first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position of the target object; and a navigation voice corresponding to the current position is played based on the target path, wherein the navigation voice represents a moving direction and a corresponding moving distance.
  • determining the target path based on the first image and the visual positioning model includes: inputting the first image into the visual positioning model to determine a first spatial position, the first spatial position representing a mapping of an image capturing point when the first image was captured in the three-dimensional simulation space; the first spatial position representing a mapping coordinate of the image capturing point when the first image was captured in the three-dimensional simulation space; acquiring a second spatial position corresponding to the target object according to the visual positioning model, the second spatial position representing a mapping of a target position where the target object is located in the three-dimensional simulation space; and generating the target path based on the first spatial position and the second spatial position.
  • the method further includes: based on the target path, obtaining the path distance between the first spatial position and the second spatial position; when the path distance is less than a first preset distance, obtaining orientation information, the orientation information representing the spatial orientation of the second spatial position relative to the first spatial position; and playing the orientation voice corresponding to the orientation information.
  • the method further includes: determining corresponding vibration parameters according to the path distance, wherein the vibration parameters characterize the vibration frequency and/or vibration amplitude; and controlling the vibration of the vibration unit based on the vibration parameters.
  • the method further includes: acquiring a second image, where the second image is the first N frames of the first image, where N is an integer greater than 0; determining image difference information based on the first image and the second image, where the image difference information represents the displacement of a reference object in the second image relative to the reference object in the first image; setting an update frequency of the visual positioning model based on the image difference information; and based on the update frequency, The visual localization model is updated.
  • the method further includes: obtaining an area identifier corresponding to the target object, the area identifier representing an image acquisition area within the first range; based on the area identifier corresponding to the target object, calling a corresponding image acquisition device to perform image acquisition to obtain a second image; and updating the visual positioning model based on the second image.
  • updating the visual positioning model according to the second image includes: performing image recognition on the second image to determine the current position of the target object; and updating the visual positioning model based on the current position of the current position.
  • an auxiliary voice navigation device comprising:
  • the interaction module is configured to cyclically call the following modules in response to a first instruction indicating a target object within a first range:
  • a processing module configured to acquire a first image, and determine a target path according to the first image and a visual positioning model, wherein the visual positioning model represents the position distribution of objects within the first range in a three-dimensional simulation space, and the target path is a moving path from a current position corresponding to the first image to a position of the target object;
  • the playing module is used to play the navigation voice corresponding to the current position according to the target path, and the navigation voice represents the moving direction and the corresponding moving distance.
  • the processing module determines the target path based on the first image and the visual positioning model
  • it is specifically used to: input the first image into the visual positioning model to determine a first spatial position, the first spatial position representing a mapping of an image shooting point when the first image is shot, in the three-dimensional simulation space; the first spatial position represents a mapping coordinate of the image shooting point when the first image is shot in the three-dimensional simulation space; according to the visual positioning model, obtain a second spatial position corresponding to the target object, the second spatial position representing a mapping of the target position where the target object is located in the three-dimensional simulation space; and generate the target path based on the first spatial position and the second spatial position.
  • the processing module is further used to: based on the target path, obtain the path distance between the first spatial position and the second spatial position; when the path distance is less than a first preset distance, obtain the orientation information, the orientation information represents the spatial orientation of the second spatial position relative to the first spatial position; the playing module is further used to: play the The directional speech corresponding to the directional information.
  • the processing module is further used to: determine corresponding vibration parameters according to the path distance, wherein the vibration parameters represent the vibration frequency and/or vibration amplitude; and control the vibration of the vibration unit based on the vibration parameters.
  • the processing module is further used to: acquire a second image, where the second image is the first N frames of the first image, where N is an integer greater than 0; determine image difference information based on the first image and the second image, where the image difference information represents the displacement of a reference object in the second image relative to the reference object in the first image; set an update frequency of the visual positioning model based on the image difference information; and update the visual positioning model based on the update frequency.
  • the processing module is further used to: obtain an area identifier corresponding to the target object, the area identifier representing an image acquisition area within the first range; based on the area identifier corresponding to the target object, call a corresponding image acquisition device to perform image acquisition to obtain a second image; and update the visual positioning model based on the second image.
  • the processing module when the processing module updates the visual positioning model based on the second image, it is specifically used to: perform image recognition on the second image to determine the current position of the target object; and update the visual positioning model based on the current position of the current position.
  • an electronic device comprising: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the assisted voice navigation method as described in the first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium stores computer execution instructions.
  • the assisted voice navigation method as described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the assisted voice navigation method as described in the first aspect and various possible designs of the first aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Navigation (AREA)
  • Multimedia (AREA)

Abstract

本公开实施例提供一种辅助语音导航方法、装置、电子设备及存储介质,通过响应于指示第一范围内的目标物体的第一指令,循环执行以下步骤:获取第一图像,并根据第一图像和视觉定位模型,确定目标路径,其中,视觉定位模型表征第一范围内的物体在三维仿真空间中的位置分布,目标路径为从第一图像对应的当前位置至目标物体所在位置的移动路径;根据目标路径,播放当前位置对应的导航语音,导航语音表征移动方向和对应的移动距离。使用户能够根据播放的语音提示,到达图像采集视野外的目标物体所在位置,提高终端设备的感知和导航范围,实现了超视距、长距离的目标导航。

Description

辅助语音导航方法、装置、电子设备及存储介质
本申请要求2022年11月11日递交的、标题为“辅助语音导航方法、装置、电子设备及存储介质”、申请号为2022114157690的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及智能终端技术领域,尤其涉及一种辅助语音导航方法、装置、电子设备及存储介质。
背景技术
当前,我国存在人数庞大的视障群体,由于存在不同程度的视力障碍,使视障群体的独立出行存在极大的不便,针对视障群体的出行问题,相关技术中,利用手持智能终端设备对周围环境进行图像采集,实现环境感知,并将环境感知的结果,转换为语音进行播报,使视障群里用户能够基于语音播报的内容,确定周围环境。
然而,现有技术中的方案,存在感知范围有限、无法实现长距离目标导航的问题。
发明内容
本公开实施例提供一种辅助语音导航方法、装置、电子设备及存储介质,以克服智能终端设备的感知范围有限、无法实现长距离目标导航的问题。
第一方面,本公开实施例提供一种辅助语音导航方法,包括:
响应于指示第一范围内的目标物体的第一指令,循环执行以下步骤:获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;根据所述目标路径,播放所述当前位置对应的导航语音,所述导 航语音表征移动方向和对应的移动距离。
第二方面,本公开实施例提供一种辅助语音导航装置,包括:
交互模块,用于响应于指示第一范围内的目标物体的第一指令,循环调用以下模块:
处理模块,用于获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;
播放模块,用于根据所述目标路径,播放所述当前位置对应的导航语音,所述导航语音表征移动方向和对应的移动距离。
第三方面,本公开实施例提供一种电子设备,包括:
处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的辅助语音导航方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的辅助语音导航方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的辅助语音导航方法。
本实施例提供的辅助语音导航方法、装置、电子设备及存储介质,通过响应于指示第一范围内的目标物体的第一指令,循环执行以下步骤:获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;根据所述目标路径,播放所述当前位置对应的导航语音,所述导航语音表征移动方向和对应的移动距离。通过采集第一图像,并结合视觉定位模型,利用视觉定位模型能够表征第一范围内物体在三维仿真空间中的位置分布的能力,确定出从当前位置移动至目标物体所在位置的移动路径,并转 换为语音进行播放,使用户能够根据播放的语音提示,到达图像采集视野外的目标物体所在位置,提高终端设备的感知和导航范围,实现了超视距、长距离的目标导航。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的辅助语音导航方法的一种应用场景图;
图2为本公开实施例提供的辅助语音导航方法的流程示意图一;
图3为图2所示实施例中步骤S102的具体实现步骤流程图;
图4为本公开实施例提供的一种生成目标路径的过程示意图;
图5为本公开实施例提供的辅助语音导航方法的流程示意图二;
图6为本公开实施例提供的一种播放方位语音的过程示意图;
图7为本公开实施例提供的辅助语音导航方法的流程示意图三;
图8为图7所示实施例中步骤S302的具体实现步骤流程图;
图9为图7所示实施例中步骤S303的具体实现步骤流程图;
图10为图9所示实施例中步骤S3033的具体实现步骤流程图;
图11为本公开实施例提供的辅助语音导航装置的结构框图;
图12为本公开实施例提供的一种电子设备的结构示意图;
图13为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
下面对本公开实施例的应用场景进行解释:
图1为本公开实施例提供的辅助语音导航方法的一种应用场景图,本公开实施例提供的辅助语音导航方法,可以应用于针对视障用户的语音出行导航场景,更具体地,例如应用于针对视障用户的室内目标物体导航的应用场景。如图1所示,本公开实施例提供的方法,可以应用于终端设备,例如智能手机、穿戴设备等。其中,示例性地,终端设备与云服务通信连接,并与云服务器进行数据交互,在例如针对视障用户的室内目标物体导航的应用场景下,终端设备在接收到视障用户寻找目标物体的指令后,通过采集环境图像,并将环境图像转换为对应的导航语音进行播报,如图所示,导航语音的内容为“向前直行10米”。视障用户可以根据语音播报进行步行移动,最终达到目标物体所在位置,从而实现基于辅助语音的目标物体导航。更具体地,该室内目标物体导航的应用场景,例如为在图书馆内寻找特定书籍的场景、在超市内寻找特定物品的场景。
相关技术中,针对视障群体的出行问题,相关技术中,利用手持智能终端设备对周围环境进行图像采集,实现环境感知,并将环境感知的结果,转换为语音进行播报,使视障群里用户能够基于语音播报的内容,确定周围环境的情况。然而,上述方案是基于实时采集的环境图像进行识别、转换后生成语音进行播报,而对于环境图像外的物体则无法感知,因此,上述方案所生成的语音只能实现一般性的提示作用,而无法感知和播报环境图像外的物体,也无法实现针对环境图像外的物体的导航。
本公开实施例提供一种辅助语音导航方法以解决上述问题。
参考图2,图2为本公开实施例提供的辅助语音导航方法的流程示意图一。本实施例的方法可以应用在终端设备中,该辅助语音导航方法包括:
步骤S101:接收用户输入的第一指令,第一指令用于表征第一范围内的目标物体。
示例性地,参考图1所示的应用场景图,本实施例中的执行主体为终端设备,例如智能穿戴设备,一种可能的实现方式中,第一指令为用户发出的语音指令,终端设备以一个预设的频率进行语音信号检测,当检测并识别到特定内容的语音时,根据语音内容,得到对应第一指令。更具体地,终端设备例如可以先通过低采样率,检测唤起语音,唤起语音的内容例如为“小A你好”,在检测到唤起语音后,再以高采样率检测用户发出的指令语音,例 如“帮我找到水果货物架”,终端设备通过识别指令语音,获得对应的第一指令,即指示“水果货物架”的信息;在另一种可能的实现方式中,第一指令基于用户针对终端设备的手势操作、案件操作而生成。例如,终端设备设置有Button_1按键,该Button_1按键可以为程序按键或物理按键,在用户触发该Button_1按键后,终端设备生成对应的第一指令,该第一指令对应一个预设的目标物体,例如“房间门”,即第一指令为表征“房间门”的信息。
步骤S102:获取第一图像,并根据第一图像和视觉定位模型,确定目标路径,其中,视觉定位模型表征第一范围内的物体在三维仿真空间中的位置分布,目标路径为从第一图像对应的当前位置至目标物体所在位置的移动路径。
进一步地,终端设备在获得第一指令之后或同时,通过终端设备上设置的图像采集单元,获取当前环境下的图像,即第一图像。示例性地,第一图像可以为通过图像采集单元拍摄的一帧图片,也可以是通过图像采集单元拍摄的多帧图片的拼接图、叠加图。其中,拼接图是指基于拍摄的多帧图片的图像视野,对多帧图片进行拼接,形成一个具有更大图像视野的图片;叠加图是指对具有相同或相似的图像视野的多帧图片进行叠加,从而得到的具有更高对比度和清晰度的图片。对多帧图片进行拼接、叠加从而得到拼接图、叠加图的具体实现方式此处不再赘述。
示例行动,在获得第一图像后,通过视觉定位模型对第一图像进行处理,从而得到表征第一图像对应的当前位置至目标物体所在位置的移动路径,即目标路径。具体地,视觉定位模型是表征第一范围内的物体在三维仿真空间中的位置分布的模型。示例性地,三维仿真空间是对第一范围的真实环境的仿真,视觉定位模型为描述该三维仿真空间的模型,简言之,视觉定位模型可以视为针对第一范围的三维地图数据。更具体地,例如,第一范围对应一个超市的室内范围Zoom_1,三维仿真空间即表征该超市的室内范围Zoom_1内的环境和物体的虚拟空间,三维仿真空间中例如包括超市内的货架、货品和道路;而视觉定位模型是对三维仿真空间的描述,其中例如包括:超市内货架、货品和道路的标识、体积、位置等信息,视觉定位模型的具体实现方式有多种,可以通过三维像素矩阵和对应的物品标签的方式实现,也可以通过配置表来描述每一物体对应的标签、位置、体积等信息的方式实现,视觉 定位模型的具体实现方式可根据需要设置,此处不再一一举例。
其中,进一步地,该视觉定位模型可以是部署在终端设备本地的模型,也可以为部署在与终端设备通信的云服务器的模型,一种可能的实现方式中,该视觉定位模型可以为部署在与终端设备通信的云服务器内的视觉定位服务(Visual Positioning Service,VPS)。
在获得该视觉定位模型后,通过第一图像和目标物体,分别对视觉定位模型进行搜索,可以得到第一图像对应的位置和目标物体对应的位置,再结合预设的导航算法,生成目标路径。
在一种可能的实现方式中,如图3所示,步骤S102的具体实现步骤包括:
步骤S1021:将第一图像输入视觉定位模型,确定第一空间位置,第一空间位置表征拍摄第一图像时的图像拍摄点,在三维仿真空间中的映射。
步骤S1022:搜索视觉定位模型,获取目标物体对应的第二空间位置,第二空间位置表征目标物体所在的目标位置在三维仿真空间中的映射。
步骤S1023:基于第一空间位置和第二空间位置,生成目标路径。
示例性地,将第一图像输入视觉定位模型进行对比搜索,确定三维仿真空间中与第一图像描绘的区域一致或相似的虚拟环境区域的位置,即第一空间位置,也即(终端设备)当前所在位置;简言之,第一空间位置是第一图像所描绘的真实环境区域在三维仿真空间中的映射,该第一空间位置基于视觉定位模型表达,即使用视觉定位模型表征的三维仿真空间中的坐标系来表示。基于第一指令识别目标物体后,获得该目标物体对应的物体标识,例如,基于第一指令识别的目标物体为“水果货物架”,对应的物体标识为“#0021”,之后,基于该物体标识在视觉定位模型中进行搜索,得到该目标物体“水果货物架”所在的位置坐标,即第二空间位置。类似的,第二空间位置也基于视觉定位模型表达,即使用视觉定位模型表征的三维仿真空间中的坐标系来表示。
之后,基于视觉定位模型所表征的三维仿真空间中的道路,以及预设的导航规划算法,实现从第一空间位置至第二空间位置的导航路径,即目标路径。其中,基于地图数据(视觉定位模型)和出发点(第一空间位置)、目标点(第二空间位置),进行路径规划的算法,为本领域技术人员知晓的现有技术,此处不再赘述。
图4为本公开实施例提供的一种生成目标路径的过程示意图,如图4所示,将第一图像Pic_1和目标物体的物体标识Ob_01分别输入视觉定位模型,一方面,视觉定位模型基于第一图像Pic_1中的图像内容进行识别,确定该图像内容在三维仿真空间中的映射区域,进而基于该映射区域确定第一图像Pic_1对应的图像拍摄点在三维仿真空间中的定位点P1;另一方面,视觉定位模型基于物体标识Ob_01进行搜索,得到物体标识Ob_01对应的定位点P2,之后,将定位点P2和定位点P2输入导航规划算法,生成目标路径,其中,导航规划算法可以是视觉定位模型所提供的能力。
步骤S103:根据目标路径,播放当前位置对应的导航语音,导航语音表征移动方向和对应的移动距离。
在获取目标路径后,根据终端设备的当前位置,即之前步骤中得到的第一空间位置,确定沿目标路径移动所对应的移动方向和移动距离,例如移动方向为“北”,移动距离为“10米”,之后则基于预设的语音生成模板,将上述移动方向和移动距离的信息,转换为对应的导航语音,例如,“向北移动10米”,一种可能的实现方式中,为了让视障用户确定移动方向,终端设备可以将绝对方向转换为“左”、“右”等相对方向,具体转换方式例如包括:通过第一图像和视觉定位模型进行识别,确定当前用户的面朝方向,从而实现绝对方向至相对方向的转换。之后,通过播放导航语音,来引导用户自当前位置沿目标路径移动,最终达到目标物体所在的目标位置,实现目标物体导航的目的。
步骤S104:若当前位置到达目标位置,则结束循环;若当前位置未到达目标位置,则返回步骤S102。
示例性地,在播放导航语音之后,可以基于之前步骤中获得的当前位置,或者额外进行一次位置测量而获得最新的当前位置,并利用视觉定位模型来检测该当前位置与目标位置是否重合,若二者重合,说明用户(终端设备)已经到达目的地,则结束导航过程;若二者不重合,则返回步骤S102,重新获取实时的第一图像,重复上述步骤继续进行语音导航,直至到的目标位置。
在本实施例中,通过响应于指示第一范围内的目标物体的第一指令,循环执行以下步骤:获取第一图像,并根据第一图像和视觉定位模型,确定目标路径,其中,视觉定位模型表征第一范围内的物体在三维仿真空间中的位 置分布,目标路径为从第一图像对应的当前位置至目标物体所在位置的移动路径;根据目标路径,播放当前位置对应的导航语音,导航语音表征移动方向和对应的移动距离。通过采集第一图像,并结合视觉定位模型,利用视觉定位模型能够表征第一范围内物体在三维仿真空间中的位置分布的能力,确定出从当前位置移动至目标物体所在位置的移动路径,并转换为语音进行播放,使用户能够根据播放的语音提示,到达图像采集视野外的目标物体所在位置,提高终端设备的感知和导航范围,实现了超视距、长距离的目标导航。
参考图5,图5为本公开实施例提供的辅助语音导航方法的流程示意图二。本实施例在图2所示实施例的基础上,增加了进行对第二空间位置进行方位指示的步骤,该辅助语音导航方法包括:
步骤S201:接收用户输入的第一指令,第一指令用于表征第一范围内的目标物体。
步骤S202:获取第一图像,并根据第一图像和视觉定位模型,确定目标路径,其中,视觉定位模型表征第一范围内的物体在三维仿真空间中的位置分布,目标路径为从第一图像对应的当前位置至目标物体所在位置的移动路径。
步骤S203:根据目标路径,获取第一空间位置与第二空间位置的路径距离。
步骤S204:当路径距离大于第一预设距离时,播放当前位置对应的导航语音,导航语音表征移动方向和对应的移动距离,并返回步骤S202。
步骤S205:当路径距离小于第一预设距离时,获取方位信息,方位信息表征第二空间位置相对第一空间位置的空间方位。
步骤S206:播放方位信息对应的方位语音。
示例性地,对于视障用户而言,在室内环境下进行目标物体导航的场景下,即使通过导航语音将目标用户引导至目标位置,仍然可能存在视障用户无法定位目标物体的具体位置的问题。针对该问题,本实施例进一步增加了在判断路径距离小于第一预设距离时,播放方位语音的步骤,从而实现对目标物体的精准语音指示。
具体地,示例性地,在确定目标路径后,计算第一空间位置与第二空间位置的路径距离,其中,第一空间位置表征终端设备的当前位置,第二空间 位置表征目标物体的目标位置,第一空间位置与第二空间位置的路径距离即用户(终端设备)当前距离目标物体所在目标位置的路程,其中,视觉定位模型所表征的三维仿真空间中的虚拟物体的尺寸以及虚拟物体之间的距离,是基于真实环境下的第一范围内的物体的尺寸,以及物体与物体之间的距离设置的,例如以1比1的比例设置,因此,基于目标路径,以及视觉定位模型中的第一空间位置与第二空间位置,可以得到一个表征当前位置与目标位置的路径距离的数值,之后,基于该路径距离判断,当路径距离小于或等于第一预设距离,例如1米,说明用户已经很靠近目标物体,可认为已到达目标位置,此时,可以通过第一图像或其他参考信息进行识别,得到表征第二空间位置(目标位置)相对第一空间位置(当前位置)的空间方位。该方位信息可以是一个带方向标识的角度值,例如前方30度,左侧20度。之后,对该方位信息进行转换,生成方位语音进行播报,从而使用户可以进一步地确定目标物体所在的目标位置与当前位置的方位关系,实现对目标物体的准确定位。
另一方面,若路径距离大于第一预设距离,说明此时距离目标物体仍有较远距离,无需确定目标物体的方位,因此,播放当前位置对应的导航语音,具体实现过程在图2所示实施例中已进行介绍,此处不再赘述。
进一步地,在步骤S203之后,还包括:
步骤S207:根据路径距离,确定对应的振动参数,振动参数表征振动频率和/或振动幅度。
步骤S208:基于振动参数控制振动单元振动。
示例性地,终端设备上设置有用户产生振动的振动单元,其中,该振动单元发出的振动的振动频率和/或振动幅度,与路径距离相关,一种可能的实现方式中,在确定实时的路径距离后,基于路段距离,设置对应的振动参数,路径距离越小,振动幅度越大和/或振动幅度;或者,当路径距离小于第一预设距离时,振动单元启动,或者振动频率和/或振动幅度增加。
在基于方位语音来播放目标物体的方位时,由于语音播报的实时性较差,在方位语音播放的过程中(即到达目标位置时),视障用户可能仍然在进行移动,导致出现“走过了”的情况。从而使用户当前的实际位置与方位语音所指示的方位信息对应的当前位置不匹配,进而,导致视障用户在根据方位 语音所指示的方位去拿物品时,无法拿到目标物品。而本实施例中,利用振动提示的实时性好、能够连续变化的特点,使视障用户能够基于振动单元产生的振动特性(振动频率和/或振动幅度)连续变化的振动,来预判是否将要到达目标位置,并在到达目标位置(路径距离小于第一预设距离时),控制振动单元的振动特性发生改变,使用户第一时间接收到针对指示,并停止移动,再结合方位语音,实现对目标物体的准确拿取。
下面以一个具体的实施例进行说明。
图6为本公开实施例提供的一种播放方位语音的过程示意图,如图6所示,示例性地,终端设备例如为智能手机,对应超市的室内导航的应用场景,具体地,目标物体例如为“水果货物架”,在用户基于导航语音向目标物体对应的目标位置移动的过程中,终端设备通过实时采集第一图像,确定第一空间位置,并计算第一空间位置和目标位置对应的第二空间位置的路径距离,并基于路径距离,调整振动单元的振动幅度,路径距离越短,振动幅度越大,例如,如图所示,当用户(终端设备)位于目标路径的A位置时,振动单元发出的振动的振动幅度为p毫米/秒(mm/s),当用户(终端设备)位于距离目标位置更近的B位置时,振动单元发出的振动的振动幅度为2p毫米/秒,该过程中振动单元发出的振动的幅度连续变化,但A位置和B位置对应的振动单元的振动频率一致,均为f赫兹(Hz);当用户(终端设备)到达目标位置对应的C位置(路径距离小于第一预设距离)时,振动单元发出的振动的振动幅度为3p毫米/秒,且振动频率变为2f赫兹,振动频率发生突变,从而提示用户到达目标位置,停止移动。之后,终端设备基于同一帧第一图像计算得到的方位信息,生成方位语音并进行播放,实现对目标物体所处方位的指示,使用户可以基于该方位语音的引导,准确的拿取目标物体。
在本实施例中,步骤S201-S202与图2所示实施例中的步骤S101-S102的一致,详细论述请参考步骤S101-S102的论述,这里不再赘述。
参考图7,图7为本公开实施例提供的辅助语音导航方法的流程示意图三。本实施例在图2所示实施例的基础上,增加了对视觉定位模型进行更新的步骤,该辅助语音导航方法包括:
步骤S301:接收用户输入的第一指令,第一指令用于表征第一范围内的目标物体。
步骤S302:获取第一图像,并根据第一图像设置视觉定位模型的更新频率,视觉定位模型表征第一范围内的物体在三维仿真空间中的位置分布。
示例性地,视觉定位模型是表征第一范围内的物体在三维仿真空间中的位置分布的模型,在一些具体的应用场景下,当第一范围内的物体发生改变,则需要同步对视觉定位模型进行更新,以保证视觉定位模型的准确性,进而保证基于视觉定位模型生成的目标路径的准确性,避免由于视觉定位模型未及时更新,导致生成的目标路径而引发视障用户发生碰撞的问题。然后,由于视觉定位模型中涉及的目标对象多,数据量大,尤其是第一范围较大时,因此,频繁更新视觉定位模型,会导致不必要的开销,造成资源浪费。一种可能的实现方式中,通过检测第一图像的变化,确定对应的视觉定位模型的更新频率,当第一图像变化较大时,说明当前环境中,也即第一范围内的物体变动较频繁,此时,为视觉定位模型设置较高的更新频率,提高视觉定位模型的准确性;反正,则为视觉定位模型设置较低的更新频率,降低各类资源的消耗。
在一种可能的实现方式中,如图8所示,步骤S302的具体实现步骤包括:
步骤S3021:获取第二图像,第二图像为第一图像的前N帧图像,N为大于0的整数。
步骤S3022:根据第一图像和第二图像,确定图像差异信息,图像差异信息表征第二图像中的参考物体相对第一图像中的参考物体的位移量。
步骤S3023:根据图像差异信息,设置视觉定位模型的更新频率。
示例性地,在循环采集第一图像的过程中,将最近N次采集的第一图像保存为历史环境图片,之后,在每一次采集到第一图像后,提取该第一图像的前N帧图像,即为第二图像,N为大于0的整数,例如为30,即将当前实时采集的第一图像与30帧之前采集的第一图像(第二图像)进行对比,得到对应的表征第二图像中的参考物体相对第一图像中的参考物体的位移量的图像差异信息。其中,第二图像中的参考物体与第一图像中的参考物体为同一个物体,例如行人、车辆等。当第二图像中的参考物体与第一图像中的参考物体的位移量较大时,说明当前环境下的物体变动较频繁,则对应设置更高的更新频率,反之,当第二图像中的参考物体与第一图像中的参考物体的位移量较小时,说明当前环境下的物体变动不频繁,则对应设置更高的更低频 率,从而提高计算资源、网络资源的利用率。
步骤S303:基于更新频率,更新视觉定位模型。
示例性地,在获得更新频率后,基于该更新频率,例如每30帧或每一分钟,对视觉定位模型进行更新。一种可能的实现方式,视觉定位模型对应多个空间区域,在获得更新频率后,可以基于更新频率对视觉定位模型中的全部空间区域对应的数据进行更新,也可以仅对当前位置(第一空间位置)对应的空间区域对应的数据进行更新,从而提高资源利用率。
在另一种可能的实现方式中,如图9所示,步骤S303的具体实现步骤包括:
步骤S3031:获取目标物体对应的区域标识,区域标识表征第一范围内的图像采集区域;
步骤S3032:基于目标物体对应的区域标识,调用对应的图像采集设备进行图像采集,得到第二图像;
步骤S3033:根据第二图像,更新视觉定位模型。
示例性地,在另一种实现方式中,终端设备与图像采集设备直接或间接的通信连接,图像采集设备例如为基于物联网的分布式智能摄像头,该图像采集设备通过与终端设备通信,或者与云服务器通信,接收发自终端设备或云服务器的图像采集指令,而进行图像采集。其中,分布式的智能摄像头分别对应一个图像采集区域,通过采集该图像采集区域的图像,来对视觉定位模型进行模型更新,一种可能的应用场景下,目标物体是具有移动能力的物体,例如图书馆、超市内的服务机器人,因此,目标物体的位置会随机发生变化,针对该应用场景,本实施例中,终端设备在确定目标物体后,通过查询视觉定位模型,确定该目标物体对应的图像采集区域,获取标物体对应的区域标识,之后,调用与该区域标识对应的图像采集设备,基于之前步骤中确定的更新频率,采集第二图像,并基于该第二图像更新视觉定位模型,从而使视觉定位模型中存储的目标物体的位置信息更加准确和实时。
其中,示例性地,如图10所示,步骤S3033的具体实现步骤包括:
步骤S3033A:对第二图像进行图像识别,确定目标物体的当前位置。
步骤S3033B:基于目标物体的当前位置,更新视觉定位模型。
本实施例中,通过获取目标物体对应的区域标识,并基于区域标识调用 对应的分布式图像采集设备进行区域图像采集,实现针对动态的目标物体的定向更新,从而保证生成的目标路径准确、合理的同时,避免过度更新视觉定位模型造成资源浪费的问题。
步骤S304:根据第一图像和视觉定位模型,确定目标路径,其中,目标路径为从第一图像对应的当前位置至目标物体所在位置的移动路径。
步骤S305:根据目标路径,播放当前位置对应的导航语音。
步骤S306:若当前位置到达目标位置,则结束循环;若当前位置未到达目标位置,则返回步骤S302。
在本实施例中,步骤S301、S304、S305的具体实现步骤与图2所示实施例中的步骤S101-S103一致,详细论述请参考图2所示实施例中对步骤S101-S103的论述,这里不再赘述。
需要说明的是,本实施例提供的辅助语音导航方法,也可以在图5所示实施例的基础上实现,即在本实施例的基础上,再结合图5所示实施例中基于路径距离对振动单元进行设置的技术特征(步骤S203-S208),从而实现振动单元控制、方位语音播放的目的,此处不再进行赘述。
对应于上文实施例的辅助语音导航方法,图11为本公开实施例提供的辅助语音导航装置的结构框图。为了便于说明,仅示出了与本公开实施例相关的部分。参照图11,辅助语音导航装置4包括:
交互模块41,用于响应于指示第一范围内的目标物体的第一指令,循环调用以下模块:
处理模块42,用于获取第一图像,并根据第一图像和视觉定位模型,确定目标路径,其中,视觉定位模型表征第一范围内的物体在三维仿真空间中的位置分布,目标路径为从第一图像对应的当前位置至目标物体所在位置的移动路径;
播放模块43,用于根据目标路径,播放当前位置对应的导航语音,导航语音表征移动方向和对应的移动距离。
在本公开的一个实施例中,处理模块42在根据第一图像和视觉定位模型,确定目标路径时,具体用于:将第一图像输入视觉定位模型,确定第一空间位置,第一空间位置表征拍摄第一图像时的图像拍摄点,在三维仿真空间中的映射;搜索视觉定位模型,获取目标物体对应的第二空间位置,第二空间 位置表征目标物体所在的目标位置在三维仿真空间中的映射;基于第一空间位置和第二空间位置,生成目标路径。
在本公开的一个实施例中,处理模块42,还用于:基于目标路径,获取第一空间位置与第二空间位置的路径距离;当路径距离小于第一预设距离时,获取方位信息,方位信息表征第二空间位置相对第一空间位置的空间方位;播放模块43,还用于:播放方位信息对应的方位语音。
在本公开的一个实施例中,处理模块42,还用于:根据路径距离,确定对应的振动参数,振动参数表征振动频率和/或振动幅度;基于振动参数控制振动单元振动。
在本公开的一个实施例中,处理模块42,还用于:获取第二图像,第二图像为第一图像的前N帧图像,N为大于0的整数;根据第一图像和第二图像,确定图像差异信息,图像差异信息表征第二图像中的参考物体相对第一图像中的参考物体的位移量;根据图像差异信息,设置视觉定位模型的更新频率;基于更新频率,更新视觉定位模型。
在本公开的一个实施例中,处理模块42,还用于:获取目标物体对应的区域标识,区域标识表征第一范围内的图像采集区域;基于目标物体对应的区域标识,调用对应的图像采集设备进行图像采集,得到第二图像;根据第二图像,更新视觉定位模型。
在本公开的一个实施例中,处理模块42在根据第二图像,更新视觉定位模型时,具体用于:对第二图像进行图像识别,确定目标物体的当前位置;基于当前位置的当前位置,更新视觉定位模型。
其中,交互模块41、处理模块42、播放模块43依次连接。本实施例提供的辅助语音导航装置4可以执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图12为本公开实施例提供的一种电子设备的结构示意图,如图12所示,该电子设备5包括:
处理器51,以及与处理器51通信连接的存储器52;
存储器52存储计算机执行指令;
处理器51执行存储器52存储的计算机执行指令,以实现如图2-图10所示实施例中的辅助语音导航方法。
其中,可选地,处理器51和存储器52通过总线53连接。
相关说明可以对应参见图2-图10所对应的实施例中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。
参考图13,其示出了适于用来实现本公开实施例的电子设备900的结构示意图,该电子设备900可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图13示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图13所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(Read Only Memory,简称ROM)902中的程序或者从存储装置908加载到随机访问存储器(Random Access Memory,简称RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图13示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信 装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部 分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编 程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种辅助语音导航方法,包括:
响应于指示第一范围内的目标物体的第一指令,循环执行以下步骤:获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;根据所述目标路径,播放所述当前位置对应的导航语音,所述导航语音表征移动方向和对应的移动距离。
根据本公开的一个或多个实施例,所述根据所述第一图像和视觉定位模型,确定目标路径,包括:将所述第一图像输入所述视觉定位模型,确定第一空间位置,所述第一空间位置表征拍摄所述第一图像时的图像拍摄点,在所述三维仿真空间中的映射所述第一空间位置表征拍摄所述第一图像时的图像拍摄点在所述三维仿真空间中的映射坐标;根据所述视觉定位模型,获取所述目标物体对应的第二空间位置,所述第二空间位置表征所述目标物体所在的目标位置在所述三维仿真空间中的映射;基于所述第一空间位置和所述第二空间位置,生成所述目标路径。
根据本公开的一个或多个实施例,所述方法还包括:基于所述目标路径,获取所述第一空间位置与所述第二空间位置的路径距离;当所述路径距离小于第一预设距离时,获取方位信息,所述方位信息表征所述第二空间位置相对所述第一空间位置的空间方位;播放所述方位信息对应的方位语音。
根据本公开的一个或多个实施例,所述方法还包括:根据所述路径距离,确定对应的振动参数,所述振动参数表征振动频率和/或振动幅度;基于所述振动参数控制振动单元振动。
根据本公开的一个或多个实施例,所述方法还包括:获取第二图像,所述第二图像为所述第一图像的前N帧图像,N为大于0的整数;根据所述第一图像和所述第二图像,确定图像差异信息,所述图像差异信息表征所述第二图像中的参考物体相对所述第一图像中的所述参考物体的位移量;根据所述图像差异信息,设置所述视觉定位模型的更新频率;基于所述更新频率, 更新所述视觉定位模型。
根据本公开的一个或多个实施例,所述方法还包括:获取所述目标物体对应的区域标识,所述区域标识表征所述第一范围内的图像采集区域;基于所述目标物体对应的区域标识,调用对应的图像采集设备进行图像采集,得到第二图像;根据所述第二图像,更新所述视觉定位模型。
根据本公开的一个或多个实施例,根据所述第二图像,更新所述视觉定位模型,包括:对所述第二图像进行图像识别,确定所述目标物体的当前位置;基于所述当前位置的当前位置,更新所述视觉定位模型。
第二方面,根据本公开的一个或多个实施例,提供了一种辅助语音导航装置,包括:
交互模块,用于响应于指示第一范围内的目标物体的第一指令,循环调用以下模块:
处理模块,用于获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;
播放模块,用于根据所述目标路径,播放所述当前位置对应的导航语音,所述导航语音表征移动方向和对应的移动距离。
根据本公开的一个或多个实施例,所述处理模块在根据所述第一图像和视觉定位模型,确定目标路径时,具体用于:将所述第一图像输入所述视觉定位模型,确定第一空间位置,所述第一空间位置表征拍摄所述第一图像时的图像拍摄点,在所述三维仿真空间中的映射所述第一空间位置表征拍摄所述第一图像时的图像拍摄点在所述三维仿真空间中的映射坐标;根据所述视觉定位模型,获取所述目标物体对应的第二空间位置,所述第二空间位置表征所述目标物体所在的目标位置在所述三维仿真空间中的映射;基于所述第一空间位置和所述第二空间位置,生成所述目标路径。
根据本公开的一个或多个实施例,所述处理模块,还用于:基于所述目标路径,获取所述第一空间位置与所述第二空间位置的路径距离;当所述路径距离小于第一预设距离时,获取方位信息,所述方位信息表征所述第二空间位置相对所述第一空间位置的空间方位;所述播放模块,还用于:播放所 述方位信息对应的方位语音。
根据本公开的一个或多个实施例,所述处理模块,还用于:根据所述路径距离,确定对应的振动参数,所述振动参数表征振动频率和/或振动幅度;基于所述振动参数控制振动单元振动。
根据本公开的一个或多个实施例,所述处理模块,还用于:获取第二图像,所述第二图像为所述第一图像的前N帧图像,N为大于0的整数;根据所述第一图像和所述第二图像,确定图像差异信息,所述图像差异信息表征所述第二图像中的参考物体相对所述第一图像中的所述参考物体的位移量;根据所述图像差异信息,设置所述视觉定位模型的更新频率;基于所述更新频率,更新所述视觉定位模型。
根据本公开的一个或多个实施例,所述处理模块,还用于:获取所述目标物体对应的区域标识,所述区域标识表征所述第一范围内的图像采集区域;基于所述目标物体对应的区域标识,调用对应的图像采集设备进行图像采集,得到第二图像;根据所述第二图像,更新所述视觉定位模型。
根据本公开的一个或多个实施例,所述处理模块在根据所述第二图像,更新所述视觉定位模型时,具体用于:对所述第二图像进行图像识别,确定所述目标物体的当前位置;基于所述当前位置的当前位置,更新所述视觉定位模型。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的辅助语音导航方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的辅助语音导航方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的辅助语音导航方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (11)

  1. 一种辅助语音导航方法,其特征在于,包括:
    响应于指示第一范围内的目标物体的第一指令,循环执行以下步骤:
    获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;
    根据所述目标路径,播放所述当前位置对应的导航语音,所述导航语音表征移动方向和对应的移动距离。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一图像和视觉定位模型,确定目标路径,包括:
    将所述第一图像输入所述视觉定位模型,确定第一空间位置,所述第一空间位置表征拍摄所述第一图像时的图像拍摄点,在所述三维仿真空间中的映射;
    根据所述视觉定位模型,获取所述目标物体对应的第二空间位置,所述第二空间位置表征所述目标物体所在的目标位置在所述三维仿真空间中的映射;
    基于所述第一空间位置和所述第二空间位置,生成所述目标路径。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    基于所述目标路径,获取所述第一空间位置与所述第二空间位置的路径距离;
    当所述路径距离小于第一预设距离时,获取方位信息,所述方位信息表征所述第二空间位置相对所述第一空间位置的空间方位;
    播放所述方位信息对应的方位语音。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    根据所述路径距离,确定对应的振动参数,所述振动参数表征振动频率和/或振动幅度;
    基于所述振动参数控制振动单元振动。
  5. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取第二图像,所述第二图像为所述第一图像的前N帧图像,N为大于0的整数;
    根据所述第一图像和所述第二图像,确定图像差异信息,所述图像差异信息表征所述第二图像中的参考物体相对所述第一图像中的所述参考物体的位移量;
    根据所述图像差异信息,设置所述视觉定位模型的更新频率;
    基于所述更新频率,更新所述视觉定位模型。
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述目标物体对应的区域标识,所述区域标识表征所述第一范围内的图像采集区域;
    基于所述目标物体对应的区域标识,调用对应的图像采集设备进行图像采集,得到第二图像;
    根据所述第二图像,更新所述视觉定位模型。
  7. 根据权利要求6所述的方法,其特征在于,根据所述第二图像,更新所述视觉定位模型,包括:
    对所述第二图像进行图像识别,确定所述目标物体的当前位置;
    基于所述目标物体的当前位置,更新所述视觉定位模型。
  8. 一种辅助语音导航装置,其特征在于,包括:
    交互模块,用于响应于指示第一范围内的目标物体的第一指令,循环调用以下模块:
    处理模块,用于获取第一图像,并根据所述第一图像和视觉定位模型,确定目标路径,其中,所述视觉定位模型表征所述第一范围内的物体在三维仿真空间中的位置分布,所述目标路径为从所述第一图像对应的当前位置至所述目标物体所在位置的移动路径;
    播放模块,用于根据所述目标路径,播放所述当前位置对应的导航语音,所述导航语音表征移动方向和对应的移动距离。
  9. 一种电子设备,其特征在于,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求 1至7中任一项所述的辅助语音导航方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至7任一项所述的辅助语音导航方法。
  11. 一种计算机程序产品,其特征在于,包括计算机程序,该计算机程序被处理器执行时实现权利要求1至7中任一项所述的辅助语音导航方法。
PCT/CN2023/129805 2022-11-11 2023-11-03 辅助语音导航方法、装置、电子设备及存储介质 WO2024099238A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211415769.0 2022-11-11
CN202211415769.0A CN115900713A (zh) 2022-11-11 2022-11-11 辅助语音导航方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024099238A1 true WO2024099238A1 (zh) 2024-05-16

Family

ID=86470468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/129805 WO2024099238A1 (zh) 2022-11-11 2023-11-03 辅助语音导航方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115900713A (zh)
WO (1) WO2024099238A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115900713A (zh) * 2022-11-11 2023-04-04 北京字跳网络技术有限公司 辅助语音导航方法、装置、电子设备及存储介质

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105333878A (zh) * 2015-11-26 2016-02-17 深圳如果技术有限公司 一种路况视频导航系统及方法
JP2016070737A (ja) * 2014-09-29 2016-05-09 ジェイアール東日本コンサルタンツ株式会社 ナビゲーションシステム
CN107084736A (zh) * 2017-04-27 2017-08-22 维沃移动通信有限公司 一种导航方法及移动终端
CN107345812A (zh) * 2016-05-06 2017-11-14 湖北淦德智能消防科技有限公司 一种图像定位方法、装置及手机
CN108398133A (zh) * 2017-02-06 2018-08-14 杭州海康威视数字技术股份有限公司 一种导航方法、装置及系统
CN108827307A (zh) * 2018-06-05 2018-11-16 Oppo(重庆)智能科技有限公司 导航方法、装置、终端及计算机可读存储介质
CN110686694A (zh) * 2019-10-25 2020-01-14 深圳市联谛信息无障碍有限责任公司 导航方法、装置、可穿戴电子设备及计算机可读存储介质
CN113063421A (zh) * 2021-03-19 2021-07-02 深圳市商汤科技有限公司 导航方法及相关装置、移动终端、计算机可读存储介质
CN113532444A (zh) * 2021-09-16 2021-10-22 深圳市海清视讯科技有限公司 导航路径处理方法、装置、电子设备及存储介质
CN114413904A (zh) * 2021-12-29 2022-04-29 瞰瞰技术(深圳)有限公司 一种视觉仿真导盲方法及导盲装置
CN114677603A (zh) * 2022-03-23 2022-06-28 平安普惠企业管理有限公司 导盲方法、装置、计算机设备及计算机可读存储介质
CN115169639A (zh) * 2022-05-27 2022-10-11 湖北文理学院 自助购物车导购方法、装置、设备及存储介质
CN115218903A (zh) * 2022-05-12 2022-10-21 北京具身智能科技有限公司 一种面向视障人士的物体寻找方法及系统
CN115471637A (zh) * 2022-09-14 2022-12-13 北京河图联合创新科技有限公司 基于增强现实ar的标记寻物方法、装置和电子设备
CN115900713A (zh) * 2022-11-11 2023-04-04 北京字跳网络技术有限公司 辅助语音导航方法、装置、电子设备及存储介质

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016070737A (ja) * 2014-09-29 2016-05-09 ジェイアール東日本コンサルタンツ株式会社 ナビゲーションシステム
CN105333878A (zh) * 2015-11-26 2016-02-17 深圳如果技术有限公司 一种路况视频导航系统及方法
CN107345812A (zh) * 2016-05-06 2017-11-14 湖北淦德智能消防科技有限公司 一种图像定位方法、装置及手机
CN108398133A (zh) * 2017-02-06 2018-08-14 杭州海康威视数字技术股份有限公司 一种导航方法、装置及系统
CN107084736A (zh) * 2017-04-27 2017-08-22 维沃移动通信有限公司 一种导航方法及移动终端
CN108827307A (zh) * 2018-06-05 2018-11-16 Oppo(重庆)智能科技有限公司 导航方法、装置、终端及计算机可读存储介质
CN110686694A (zh) * 2019-10-25 2020-01-14 深圳市联谛信息无障碍有限责任公司 导航方法、装置、可穿戴电子设备及计算机可读存储介质
CN113063421A (zh) * 2021-03-19 2021-07-02 深圳市商汤科技有限公司 导航方法及相关装置、移动终端、计算机可读存储介质
CN113532444A (zh) * 2021-09-16 2021-10-22 深圳市海清视讯科技有限公司 导航路径处理方法、装置、电子设备及存储介质
CN114413904A (zh) * 2021-12-29 2022-04-29 瞰瞰技术(深圳)有限公司 一种视觉仿真导盲方法及导盲装置
CN114677603A (zh) * 2022-03-23 2022-06-28 平安普惠企业管理有限公司 导盲方法、装置、计算机设备及计算机可读存储介质
CN115218903A (zh) * 2022-05-12 2022-10-21 北京具身智能科技有限公司 一种面向视障人士的物体寻找方法及系统
CN115169639A (zh) * 2022-05-27 2022-10-11 湖北文理学院 自助购物车导购方法、装置、设备及存储介质
CN115471637A (zh) * 2022-09-14 2022-12-13 北京河图联合创新科技有限公司 基于增强现实ar的标记寻物方法、装置和电子设备
CN115900713A (zh) * 2022-11-11 2023-04-04 北京字跳网络技术有限公司 辅助语音导航方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN115900713A (zh) 2023-04-04

Similar Documents

Publication Publication Date Title
CN108648235B (zh) 相机姿态追踪过程的重定位方法、装置及存储介质
RU2678481C2 (ru) Устройство обработки информации, способ обработки информации и программа
WO2024099238A1 (zh) 辅助语音导航方法、装置、电子设备及存储介质
CN112307642B (zh) 数据处理方法、装置、系统、计算机设备及存储介质
CN108389264B (zh) 坐标系统确定方法、装置、存储介质及电子设备
US20220386061A1 (en) Audio processing method and apparatus, readable medium, and electronic device
JP2022511427A (ja) 画像特徴点の動き情報の決定方法、タスク実行方法およびデバイス
CN110544272A (zh) 脸部跟踪方法、装置、计算机设备及存储介质
CN108362279A (zh) 基于增强现实技术ar的购物导航方法、装置及系统
WO2020042968A1 (zh) 一种对象信息的获取方法、装置以及存储介质
WO2022007565A1 (zh) 增强现实的图像处理方法、装置、电子设备及存储介质
CN109600559B (zh) 一种视频特效添加方法、装置、终端设备及存储介质
JP7487321B2 (ja) 測位方法及びその装置、電子機器、記憶媒体、コンピュータプログラム製品、コンピュータプログラム
US20220107704A1 (en) Virtual paintbrush implementing method and apparatus, and computer readable storage medium
CN111256676A (zh) 移动机器人定位方法、装置和计算机可读存储介质
JP5929393B2 (ja) 位置推定方法、装置及びプログラム
CN112150560A (zh) 确定消失点的方法、装置及计算机存储介质
CN110969159B (zh) 图像识别方法、装置及电子设备
CN112270242B (zh) 轨迹的显示方法、装置、可读介质和电子设备
CN105917329A (zh) 信息显示装置和信息显示程序
CN111928861B (zh) 地图构建方法及装置
WO2023088127A1 (zh) 室内导航方法、服务器、装置和终端
CN113703704B (zh) 界面显示方法、头戴式显示设备和计算机可读介质
US10281294B2 (en) Navigation system and navigation method
CN112905328A (zh) 任务处理方法、装置及计算机可读存储介质