CN108875460B

CN108875460B - Augmented reality processing method and device, display terminal and computer storage medium

Info

Publication number: CN108875460B
Application number: CN201710340898.0A
Authority: CN
Inventors: 庞英明; 魏扼
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-05-15
Filing date: 2017-05-15
Publication date: 2023-06-20
Anticipated expiration: 2037-05-15
Also published as: CN108875460A; TWI669956B; TW201902225A; WO2018210055A1

Abstract

The embodiment of the invention discloses an AR processing method and device, a display terminal and a computer storage medium. The AR processing method comprises the following steps: acquiring AR information of a target object in the video; tracking a display position of the target object in a current image frame of the video currently displayed; and according to the display position, the AR information is overlapped into the current image frame.

Description

Augmented reality processing method and device, display terminal and computer storage medium

Technical Field

The present invention relates to the field of information technologies, and in particular, to an augmented reality (Augmented Reality, AR) processing method and apparatus, a display terminal, and a computer storage medium.

Background

AR is a display technology that performs various superimposition on an image acquired based on the real world to perform display content expansion. The information introduced in the AR in the prior art needs to be displayed around the corresponding graphic object. In the prior art, there is a problem that AR information is deviated from a corresponding graphic object, and there may be a problem that it is necessary to keep a current video still, and wait until the AR information is acquired and then perform superimposed display of the AR information. The deviation of the AR information and the suspension of this AR processing method for the current video obviously make the user experience poor.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an AR processing method and apparatus, a display terminal and a computer storage medium, so as to reduce the problem that AR information deviates from a corresponding graphic object or the problem that video needs to be stopped.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

an embodiment of the present invention provides an augmented reality AR processing method, applied to a display terminal, including:

displaying a video based on the video stream;

acquiring AR information of a target object in the video;

tracking a display position of the target object in a current image frame of the video currently displayed;

and according to the display position, the AR information is overlapped into the current image frame.

A second aspect of an embodiment of the present invention provides an augmented reality AR processing device, applied to a display terminal, including:

a display unit for displaying video based on the video stream;

the acquisition unit is used for acquiring AR information of a target object in the video;

a tracking unit for tracking a display position of the target object in a current image frame of the video currently displayed;

the display unit is further configured to superimpose the AR information on the current image frame according to the display position.

A third aspect of an embodiment of the present invention provides a display terminal, including:

the display is used for displaying information;

a memory for storing a computer program;

and the processor is connected with the display and the memory and is used for controlling the display terminal to execute any one of the AR processing methods by executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer storage medium in which a computer program is stored; the computer program is configured to be executed by a processor to implement any one of the above AR processing methods.

The embodiment of the invention provides an AR processing method and device, a display terminal and a computer storage medium, wherein the display of video is performed without stopping the display of the video; however, when video display is performed, the display position of the target object in the image frame is tracked, so that after the AR information is obtained, the AR information is displayed in a superimposed manner in the current image frame according to the obtained display position, and then the AR information can be superimposed on the target object or attached to the target object, so that the phenomenon that the AR information deviates from the target object due to the movement of the target object in different image frames of the video can be reduced, the problem that the AR information deviates from the corresponding target object is solved, and the display of the video is not stopped in the process of obtaining the AR information, so that the display of the AR information is waited, and the user experience is improved.

Drawings

Fig. 1 is a flowchart of a first AR processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram showing a reference point, a feature point, an offset vector and a mean shift vector according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a second AR processing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a video display effect according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another video display effect according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an AR processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another AR processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of another AR processing system and a processing flow according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further elaborated below by referring to the drawings in the specification and the specific embodiments.

As shown in fig. 1, the present embodiment provides an AR processing method, which is applied to a display terminal, including:

step S110: displaying a video based on the video stream;

step S120: acquiring AR information of a target object in the video;

step S130: tracking a display position of the target object in a current image frame of the video currently displayed;

Step S140: and according to the display position, the AR information is overlapped into the current image frame.

The embodiment provides an AR processing method, which is a method applied to a display terminal. The display terminal may be various types of display terminals with display screens, such as a mobile phone, a tablet computer, or a personal display terminal such as a wearable device, or may be various types of vehicle-mounted devices with display screens. The display screen can be a liquid crystal display screen, an electronic ink display screen or a projection display screen and other display screens.

The display terminal displays video based on the video stream in step S110. The video stream includes: a data stream of multi-frame images having a display timing. In step S110, the video stream may be currently acquired by the display terminal, or may be received from another device, or may be stored in the display terminal in advance, and may be used for the terminal to display a data stream forming a video.

AR information of a target object in the video is acquired in step S120. The AR information here may include: the identification information, the category information, the various attribute information and the position information of the position of the target object are one or more of the identification information, the category information, the various attribute information and the position information of the target object. In fig. 4, a target object 1 and a target object 2 are labeled; obviously, the target object 1 is a person; the target object 2 is a vehicle. Displaying an image in which AR information is superimposed in fig. 5; the AR information shown in fig. 5 includes a word: public transportation and star A. The AR information displayed in the superimposed manner is displayed adjacent to the corresponding target object. For example, the word "public transportation" is adjacent to the vicinity of the target object 1 identified as public transportation in the current image, thereby reducing the phenomenon that the AR information deviates far from the target object.

The identification information can be information such as the name or the identification serial number of the collected object corresponding to the target object. For example, there is a graphical object of a vehicle in an image, and in this embodiment, the AR information may include identification information that indicates that the vehicle is: and (3) the text information of the Benz vehicle, namely the text 'Benz' is one of the identification information.

The category information may be information indicating a category to which the target object belongs, for example, text information indicating that the vehicle is a bus, and the information is information. The text "public transportation" may be used as one of the category information.

The attribute information may characterize information of various attributes of the target object. For example, if the target object is identified as a bus and the bus number of the bus is identified, the attribute information may include: the route information of the bus. The bus number may be another example of the identification information.

The location information may be used to indicate the approximate location where the target object is currently located, e.g., if the target object is a vehicle, the vehicle current location information may be given by image recognition and/or positioning, e.g., global satellite system (GPS) positioning, etc. For example, it may be displayed that the current vehicle is located in the south of the ocean in the lake area of Beijing.

The AR information in this embodiment may be text information or image information, and in any case, the AR information herein may be information superimposed and displayed in the current image.

In this embodiment, the identifying of the image to be identified may be performed by the display terminal based on a correspondence between the image stored in the local image identification library and predetermined information, so as to extract a portion of the predetermined information and display the portion as the AR information.

The display positions of the target object in different image frames of the video may be different, and in this embodiment, the positions of the target object in the respective image frames of the video are tracked, so as to determine the display position of the target object in the current image frame.

In step S140, according to the determined display position, the AR information is displayed on or around the target object in a superimposed manner, so as to avoid the problem of information offset caused by the fact that the AR information is not superimposed on the display position, thereby improving user experience.

In the present embodiment, in step S140, the AR information is displayed and superimposed according to the display position of the target object in the current image frame, instead of being superimposed arbitrarily. The current image frame here is an image frame displayed at the current time.

The display positions herein may include: coordinates of the target object in the current image frame, etc., display position indication information.

In step S140, the display positions of each target object in each image frame in the video stream are tracked, so that the AR information is conveniently displayed on the target object or around the target object in a superimposed manner, and the problem of information offset caused by the fact that the AR information is not displayed in a superimposed manner is avoided, thereby improving user experience.

The AR information may be superimposed into a multi-frame image of a video stream until the target object disappears from the image frame of the video stream. But the position parameters of the target object in the current image frame need to be redetermined when the target object is overlapped each time, so that the AR information is switched along with the position switching of the target object between the image frames; therefore, the phenomenon that the target object moves in the image frame and is separated from the AR information is avoided, and user experience is improved again.

In some embodiments, the step S120 may include:

one or more frames of images to be identified meeting preset definition conditions are intercepted.

And acquiring AR information corresponding to the identification result of the target object in the image to be identified based on one or more frames.

In this embodiment, determining whether an image frame satisfies the preset sharpness condition may include: and extracting contour information of the corresponding image frame, and if the contour information is successfully extracted, considering the image frame as an image meeting the preset definition condition.

In some embodiments may further comprise: the gray level difference between each pixel point and the surrounding pixels in the image frame is calculated, or the gray level difference between the preset pixels and the surrounding pixels is larger than a preset threshold value, so that the image frame can be considered to be the image frame meeting the preset definition condition.

In some cases, the video stream is received from other devices, or the image frames of the video stream are pre-stored in the display terminal and are divided into key frames and non-key frames, and in this embodiment, one or more key frames may be selected as the image to be identified that satisfies the preset definition condition.

In summary, there are many ways of determining that the preset sharpness condition is satisfied in the video stream, and the method is not limited to any of the above.

In some embodiments further comprising:

and sending the image to be identified to a service platform at the network side for the service platform to identify.

In this embodiment, when image recognition is performed, any one of the graphic objects in the image to be recognized may be regarded as the target object for recognition, or only a part of the graphic objects may be used as the target object.

It should be noted that in some embodiments, there is no certain sequence between the step S120 and the step S130, and the step S120 may be performed after the step S130, or may be performed synchronously with the step S130.

Optionally, as shown in fig. 3, the step S120 may include:

step S122: the image to be identified is sent to a service platform, wherein the image to be identified is used for image identification by the service platform so as to obtain an identification result;

step S123: and receiving AR information returned by the service platform based on the identification result.

In this embodiment, the service platform may be a server formed by one or more servers to provide image recognition.

In this embodiment, the display terminal sends the image to be identified to the service platform, that is, the client intercepts one or more frames of images from the video stream, sends the images to the service platform, and the service platform performs image identification. The identification of the service platform in this embodiment may be generalized identification. The generalized identification is that the service platform identifies any identifiable graphic object in the image to be identified, so that the overall identification of each graphic object in the image to be identified is realized, and as much AR information as possible is provided.

The service platform performs image recognition, so that the load capacity of the display terminal can be reduced, and the resource consumption and the power consumption of the display terminal are reduced. If the display terminal is a mobile display terminal, the standby time to be measured of the mobile display terminal can be prolonged.

In some embodiments, the step S120 may include:

extracting image characteristics of at least part of images in the video stream;

and determining whether the image to be identified meets the preset definition condition according to the image characteristics.

The image features in this embodiment may include: contour features, texture features and/or gray features of individual graphic objects in the image to be identified, etc. These image features may be features used for image recognition.

The contour features can comprise an outer contour of an image of an object, an inner contour in the outer contour and the like, and the contour describes information such as the shape, the size and the like of a graphic object, so that the AR information can be conveniently obtained by matching the image of a reference image when the image is identified.

The texture features can be used for describing gray scale gradient between adjacent contours, can be used for image recognition as well, and can be used for reflecting information such as materials of a target object.

The gray scale features may directly include gray scale values and gray scale gradients, which may be used to extract the contour features and the texture features, etc.

In summary, the image features are various and not limited to any of the above, and for example, when the image to be recognized is a color image, the image features may further include: a characteristic of a color indicating the color of the whole or part of each target object.

In some embodiments, the extracting image features of at least a portion of the images in the video stream includes:

extracting characteristic points of at least part of images in the video stream, wherein the characteristic points are first pixel points of a first gray value; the difference between the first gray value and the second gray value of the second pixel point adjacent to the first pixel point meets the preset difference condition;

and judging the image to be identified meeting the preset definition condition according to the number of the characteristic points.

For example, the first gray value of the first pixel is A1, and the second gray value of the second pixel is B1; meeting the preset difference condition may include: the absolute value of the difference between the A1 and the B1 is not less than a difference threshold. If the gray levels of the first gray level value and the second gray level value are large enough, the pixel point can be a pixel point on the outline of the image object or a highlight point of the highlight part, and can be used as an important pixel for identifying the corresponding image object.

The second pixel may be a pixel within a neighborhood of the first pixel in some embodiments.

In some embodiments, the neighborhood is a region formed by extending N pixels in the first direction and the second direction with the first pixel as a center, and the pixels in the region may be the neighborhood. The N may be a positive integer. The first direction may be perpendicular to the second direction.

In some embodiments, the neighborhood may be a rectangular area, and the neighborhood may also be a circular area with the first pixel point as a center, and then the pixel point located in the circular area is the second pixel point.

If a significant blurring phenomenon occurs in an image frame, the gray level difference of each pixel point in the blurring area is smaller, and the feature points are fewer.

In this embodiment, it may be determined whether one image frame satisfies the preset sharpness condition based on the number of feature points. For example, when the number of feature points is greater than a number threshold, the corresponding image frame may be considered to satisfy the preset sharpness condition.

In still other embodiments, the method further comprises: and calculating the distribution density of the characteristic points of each subarea based on the number of the characteristic points and the distribution of the characteristic points, and when the distribution density of M subareas in one image is greater than a density threshold value, considering the image frame as the image frame meeting the preset definition condition.

In some embodiments, the step S140 may include:

positioning a first position parameter of the target object of a previous image frame in the video stream;

and searching a second position parameter of the target object in the current image frame based on the first position parameter.

In this embodiment, in order to reduce the display positions of positioning each target object, tracking is adopted, based on the gradual nature of the movement of the target object in two adjacent image frames in the video stream, the first position parameter of the previous image frame is combined, and the second position parameter of the current image frame is positioned, so that the positioning of the target object on the whole picture in the current image frame each time can be reduced, and the calculation amount is reduced.

For example, a search range for searching the target object is determined based on the first position parameter without searching the target object over the entire current image frame, thereby reducing the amount of computation in the search process. Specifically, a preset pixel is expanded outwards at the edge position corresponding to the current image frame by using the first position parameter to serve as the search area; and then locating a second position parameter of the target object in the current image frame by matching the search area with the image of the target object in the previous image frame. In this way, if the target object is not searched in the current search area, the search area is enlarged again or changed until the entire current image frame is searched, so that it is obvious that some target objects with slow position movement can be quickly defined as the second position parameter in the current image frame.

Of course, the above is only one way to determine the second position parameter based on the first position parameter, but is not limited to the above method. For example, the step S142 may include:

determining a reference point of the target object in a current frame based on tracking of the target object in a previous image frame, wherein the reference point is a pixel point representing a display position of the target object in the previous image frame;

determining an offset vector of each characteristic point relative to the reference point in the current image frame, wherein the characteristic point is a first pixel point of a first gray value; the difference between the first gray value and the second gray value of the second pixel point adjacent to the first pixel point meets the preset difference condition;

determining a mean shift vector of each of the feature points relative to the reference point based on the shift vector, wherein the mean shift vector comprises: mean shift direction and mean shift amount;

positioning a target point corresponding to the target object based on the reference point and the mean shift vector; the target point is a reference point of the next image frame and corresponds to the second position parameter.

The reference point in this embodiment may be a center position of the target object in a previous image frame, but is not limited to the center or the like.

In this embodiment, each feature point in the current image frame is first extracted, where the feature point is also a pixel point whose gray level difference from the surrounding pixel points satisfies a preset condition. Constructing an offset vector pointing to each feature point by taking the datum point as a vector starting position; obtaining the offset of each offset vector, and then solving the average value of the offset vectors to obtain the average value offset of the average value offset vector; and combining each offset vector, and carrying out vector operation on the direction so as to determine the mean shift direction corresponding to the mean shift vector. Typically, the shift direction of the mean shift vector points to a location where the feature point density is high. In this embodiment, the position of the target point may be the end point of the mean shift vector with the reference point as the start point. The target point may be part of the second position parameter; when tracking the target object of the next image frame, the target point of the current image frame can be used as the reference point of the next image frame; therefore, repeated iterative tracking is carried out, and the mode for positioning the display position of the target object in the current image frame has the characteristics of small calculated amount and simple and convenient realization.

As shown in fig. 2, each solid black dot in fig. 2 represents a feature point, and the open origin represents a reference point; offset vectors represented by single-line arrows; the outline arrow represents a mean shift vector, and obviously the mean shift vector starts from the current datum point and points to a region with higher distribution density of the characteristic points; the mean shift of the mean shift vector is equal to the mean of the shifts of all shift vectors.

In some embodiments, the target object is a graphical object displayed within a focal region of the image to be identified.

The acquisition of each frame of image can be carried out by a camera based on a specific focus position, and each image frame has a corresponding focus area; the image object located in the focus area is usually the sharpest graphical object and is also the image object focused on by the user, and in this embodiment, the target object is a graphical object located at least partially in the focus area in order to reduce the recognition workload.

In some embodiments, the method further comprises:

and displaying an acquisition prompt on a picture of the video in the process of acquiring the AR information, wherein the acquisition prompt is used for prompting that the AR information is currently acquired.

In some cases, if some time is required for identifying the image to be identified, in order to avoid that the user does not start identifying or fails in identification, the user is prompted to be currently in the process of acquiring the AR information by displaying the acquisition prompt. The acquisition prompt can be text information or image information. For example, a semi-transparent cover layer or the like displayed over the current image frame may be used to further enhance the user experience.

An example application is provided below in connection with any of the above embodiments:

as shown in fig. 3, this example provides an AR processing method, which may be applied to various intelligent terminals such as mobile phones and intelligent glasses, where the intelligent terminals need to perform AR information superposition when collecting video, so as to improve display effects, and may specifically include:

step S110: displaying a video based on the video stream;

step S121: extracting one or more frames of images to be identified which meet preset definition conditions in the video stream; wherein the image to be identified comprises a target object;

step S122: the image to be identified is sent to a service platform;

step S123: receiving AR information returned by the service platform based on the identification result of the image to be identified;

Step S131: tracking the display position of the target object in each image frame, so as to obtain the position parameter of the target object in the current image frame;

step S141: and according to the position parameter, displaying the AR information in a superposition manner in the current image frame.

The acquisition hint is shown in fig. 4 by a plurality of nested dashed circles.

As shown in fig. 6, the present embodiment provides an augmented reality AR processing device, which is applied to a display terminal, and includes:

a display unit 110 for displaying video based on the video stream;

an obtaining unit 120, configured to obtain AR information of a target object in the video;

a tracking unit 130, configured to track a display position of the target object in a current image frame of the video currently displayed;

the display unit 110 is further configured to superimpose the AR information on the current image frame according to the display position.

The display terminal provided in this embodiment may be various terminals including a display screen, where the display screen may be a liquid crystal display screen, an electronic ink display screen, or a projection display screen.

The acquisition unit 120, acquisition unit 120 and tracking unit 130 correspond to a processor or processing circuit in the terminal. The processor may be a Central Processing Unit (CPU), a Microprocessor (MCU), a Digital Signal Processor (DSP), an Application Processor (AP), a programmable array (PLC), or the like. The processor circuit may be an Application Specific Integrated Circuit (ASIC). The processor or processing circuit may perform the above operations by execution of executable code.

In summary, when the display terminal performs AR display, the device provided in this embodiment tracks the display position of the target object in each frame image in the video, so as to ensure that AR information is displayed in a superimposed manner on the corresponding target object attachment, thereby reducing the phenomenon that AR information of the target object a is superimposed on the periphery of the target object B, reducing the phenomenon that AR information deviates from the target object, and improving user experience.

Optionally, the obtaining unit 120 is specifically configured to extract one or more frames of images to be identified in the video stream, where the one or more frames of images satisfy a preset definition condition; and acquiring AR information corresponding to the identification result of the target object in the image to be identified based on one or more frames.

The obtaining unit 120 is specifically configured to send the image to be identified to a service platform, where the image to be identified is used for image identification by the service platform to obtain an identification result; and receiving AR information returned by the service platform based on the identification result.

In this embodiment, the AR information is from a service platform, and the service platform may provide as much information as possible to the client through information search, so as to reduce the problem that the AR information is not abundant enough or the information amount is small because the information storage of the terminal itself is insufficient.

Optionally, the acquiring unit 120 is specifically configured to extract image features of at least part of the images in the video stream; and determining whether the image to be identified meets the preset definition condition according to the image characteristics.

The acquiring unit 120 in this embodiment is mainly configured to select one or more frames of sufficiently clear images through extraction of image features, and send the images to a service platform or the service platform itself for recognition, so as to improve recognition accuracy and probability of successful recognition.

Optionally, the obtaining unit 120 is specifically configured to extract a feature point of at least a part of the images in the video stream, where the feature point is a first pixel point of a first gray value; the difference between the first gray value and the second gray value of the second pixel point adjacent to the first pixel point meets the preset difference condition; and judging the image to be identified meeting the preset definition condition according to the number of the characteristic points.

In this embodiment, the image to be identified that satisfies the preset definition condition is determined by extracting the feature points, for example, detection of the FAST feature points is adopted. The FAST may be an abbreviation for Features from Accelerated Segment Test.

In some embodiments, the tracking unit 130 is specifically configured to locate a first position parameter of the target object in a previous image frame in the video stream; and searching a second position parameter of the target object in the current image frame based on the first position parameter.

In this embodiment, based on the correlation between the position parameters of two adjacent image frames, the second position parameter of the target object in the current image frame is acquired, so as to reduce the calculation amount for positioning the second position parameter.

Optionally, the tracking unit 130 is specifically configured to determine, based on tracking the target object in the previous image frame, a reference point of the target object in the current frame, where the reference point is a pixel point that characterizes a display position of the target object in the previous image frame; determining an offset vector of each characteristic point relative to the reference point in the current image frame, wherein the characteristic point is a first pixel point of a first gray value; the difference between the first gray value and the second gray value of the second pixel point adjacent to the first pixel point meets the preset difference condition; determining a mean shift vector of each of the feature points relative to the reference point based on the shift vector, wherein the mean shift vector comprises: mean shift direction and mean shift amount; positioning a target point corresponding to the target object based on the reference point and the mean shift vector; the target point is a reference point of the next image frame and corresponds to the second position parameter.

The first position parameter may include coordinates of a target point of a previous image frame in the present embodiment; the second position parameter may be coordinates of a target point of the current image frame. And (3) rapidly positioning the target point of the current image frame based on the target point of the previous image frame through the determination of the mean shift vector.

Optionally, the target object is a graphical object displayed in a focal region of the image to be identified. Thus, the identification of unnecessary graphic objects and the return of AR information can be reduced, the display of unnecessary graphic information is reduced, and the information interference to users is reduced.

Optionally, the display unit 110 is further configured to display an acquisition prompt on a screen of the video in a process of acquiring the AR information, where the acquisition prompt is used to prompt that the AR information is currently being acquired.

In this embodiment, through the display of the acquisition prompt, the user may be prompted to acquire AR information currently, so as to reduce the anxiety state of the user in the waiting process, and improve the user experience again.

As shown in fig. 7, the present embodiment provides a display terminal, including:

a display 210 for information display;

a memory 220 for storing a computer program;

And a processor 230, coupled to the display and the memory, for controlling the display terminal to execute the AR processing method provided in any one of the foregoing embodiments by executing the computer program, for example, the AR processing method provided in fig. 1, and so on.

The display 210 in this embodiment may be various types of displays, liquid crystal displays, projection displays, electronic ink displays, or the like.

The memory 220 may be various types of storage media, such as a random access medium, a read-only storage medium, a flash memory, or an optical disk. The memory 220 in this embodiment comprises at least a part of a non-transitory storage medium, where the non-transitory storage medium may be used for storing the computer program.

The processor 230 may be CPU, MCU, DSP, AP or various processors or processing circuits such as a PLC or ASIC, and the display 210 may be executed by a computer program to superimpose and display AR information on the current image frame of the video.

As shown in FIG. 7, the end display 210, memory 220, and processor 230 are all connected by a bus 250, which bus 250 may include, for example, an integrated circuit bus (IIC) bus or a Peripheral Component Interconnect (PCI) bus.

The client may also include a network interface 240 in some embodiments, the network interface 240 being operable to connect to a network side, to connect with the service platform.

The present embodiment also provides a computer storage medium in which a computer program is stored; the computer program is configured to be executed by a processor to implement the AR processing method provided in any one of the foregoing embodiments, for example, the AR processing method shown in fig. 1.

The computer storage medium may be various types of storage media, optionally non-transitory storage media. The computer storage medium may be selected from a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, etc., and may store program codes.

A specific example is provided below in connection with any of the embodiments described above:

as shown in fig. 8, the present example provides an AR processing system including:

a client and a server;

the client may be the terminal displaying AR information;

the server side can provide a service platform of a network side for the client side to support AR processing.

The client comprises:

the kernel module corresponds to the kernel of the operating system and can be used for carrying out background information processing, for example, AR information can be obtained through interaction with the service platform;

the AR engine (SDK) is used for acquiring the position information of the target object and transmitting the corresponding position parameters to the display screen based on the position information;

the display screen, corresponding to the display user interface, can be used for video display, and based on the position parameters provided by the ARSDK, the AR information forwarded from the kernel module is correctly overlapped to the vicinity of the corresponding target object.

The server side may include: the system comprises a proxy server, an identification server and a search server;

the proxy server is used for carrying out information interaction with the client, for example, receiving an image to be identified sent by the client;

the identification server is connected with the proxy server and is used for receiving the image to be identified forwarded by the proxy server and then sending the identification result to the search server;

the search server is connected with the identification server and is used for inquiring AR information based on the search result and sending the AR information to the client through the proxy server.

The following specifically provides an application method of AR information in the system, which comprises the following steps:

The object real-scene scanning tracking is a process that a terminal long-time transmission real-time picture cloud end identification is followed by terminal display. In this example, the background of the terminal is connected with an identification server, a search server and a proxy server for information integration. The terminal comprises an ARSDK, a data transmission unit for transmitting data between the terminal and the network platform and a UI unit for performing display interaction with a user. The specific flow is as follows

The terminal opens the camera through the application, at the moment, the video stream can be led into the terminal network transmission unit by the UI unit, the network transmission unit detects through the FAST feature points, the number of the FAST feature points can represent whether objects in the picture have enough recognition conditions, and the images meeting the requirement of the feature points are images with enough preset clear conditions, so that the frame of images are led into the background proxy server.

The background proxy server receives the picture uploaded by the terminal, sends the picture to the cloud identification server, and the cloud identification server performs image generalization identification to identify the type, position and number information of the object in the image and then sends the information to the background proxy server

After the background proxy server takes the image category information, the information is sent to the consultation search center server to scoop up the related consultation information of the object category, and if the related information is not available, the information is returned to the air directly. The proxy server transmits the information of the image and the consultation information to the terminal at this time. The related information is one of the aforementioned AR information.

The terminal has a module for receiving, if the number of times directly transmits the information to the UI for drawing, the current object motion may change at the moment because of time consumption of network transmission and recognition, and if the current object motion is also drawn according to the position of the uploaded frame of picture, drawing offset is likely to occur. Therefore, the data transfer module does not plug the information to the UI at this time, but issues an update to the ARSDK to acquire the location.

The ARSDK transmits the image frame data of the transmission module to the local following module, the local following module calculates the offset mean value of the characteristic point of the first frame of picture by using a mean value shift algorithm, then takes the mean value as a new starting point, and searches for the moving position of the characteristic point corresponding to the next frame of picture again, so that the ARSDK can always follow the object in the picture, can acquire the position information of the object in real time, and can transmit the latest position of the object to the kernel module after receiving the object transmitted by the kernel.

The kernel updates the latest position information of the object, transmits the consultation information, the category information and the position information to the UI unit, and the UI unit can draw the mark on the screen after receiving the related information. And the position is accurate.

As shown in fig. 8, the AR information processing method may include:

Step 1: the display screen provides video streams for the kernel module;

step 2: the kernel module provides an image to be identified to the proxy server;

step 3: the proxy server provides the image to be identified to the identification server;

step 4: the recognition server gives the recognition result to the proxy server, which in some embodiments may feed back the recognition result directly to the search server;

step 5: the proxy server sends the identification result to the search server;

step 6: the search server returns the AR information searched based on the identification result to the proxy server;

step 7: the proxy server forwards the AR information to a kernel module of the client;

step 8: the kernel module acquires new position parameters from the ARSDK; the position parameter is the position parameter of the target object in the current image frame;

step 9: the ARSDK sends the updated position parameters to the kernel module;

step 10: the kernel module returns the AR information and the updated position parameters to the display screen, and the display screen can display the video and simultaneously superimpose and display the AR information near the target object.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated in one processing module, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The augmented reality AR processing method is characterized by being applied to a display terminal, wherein the display terminal comprises an AR engine, a data transmission unit for transmitting data between the display terminal and a network platform and a UI unit for performing display interaction, and the method comprises the following steps:

Opening a camera by application, so that a video stream is imported into the data transmission unit by the UI unit;

displaying video based on the video stream;

extracting characteristic points of at least part of images in the video stream through the data transmission unit, calculating the distribution density of the characteristic points of each subarea according to the number of the characteristic points and the distribution of the characteristic points, and determining that the images are to be recognized images meeting preset clear conditions when the distribution density of a preset number of subareas in one image is larger than a density threshold value, wherein the characteristic points are first pixel points of a first gray value, and the difference between the first gray value and a second gray value of a second pixel point adjacent to the first pixel point meets preset difference conditions;

transmitting the image to be identified to a background proxy server through the data transmission unit;

the background proxy server is used for receiving the image to be identified, sending the image to be identified to the cloud identification server, performing image generalization identification on the image to be identified to identify the category, position and number information of the target object in the image to be identified, and transmitting an identification result to the background proxy server; the background proxy server is further used for sending the identification result to a consultation search center server, and the consultation search center server is used for returning AR information corresponding to the identification result to the background proxy server; the background proxy server is further configured to send the AR information to the data transmission unit;

Transmitting the AR information to the AR engine through the data transmission unit;

locating, by the AR engine, a first location parameter of the target object in a previous image frame in the video stream; expanding a preset pixel outwards at the edge position corresponding to the current image frame of the video by using the first position parameter to serve as a search area; locating a second position parameter of the target object in the current image frame by matching the search area with the image of the target object in the previous image frame;

when the AR information is received by the AR engine, the second position parameter is transmitted to the data transmission unit;

updating the position information of the target object through the data transmission unit, and transmitting the AR information and the second position parameter to the UI unit;

and superposing and displaying the AR information in the current image frame by the UI unit according to the second position parameter, wherein the AR information is displayed adjacent to the corresponding target object.

2. The method according to claim 1, wherein the method further comprises:

Determining an offset vector of each feature point in the current image frame relative to the reference point;

3. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the target object is a graphic object displayed in a focus area of the image to be recognized.

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the method further comprises the steps of:

5. An augmented reality AR processing device, which is applied to a display terminal, wherein the display terminal comprises an AR engine, a data transmission unit for transmitting data between the display terminal and a network platform, and a UI unit for performing display interaction, and the device comprises:

A module for performing the steps of: opening a camera by application, so that a video stream is imported into the data transmission unit by the UI unit;

a display unit configured to display video based on the video stream;

the acquisition unit is used for extracting characteristic points of at least part of images in the video stream in the data transmission unit, calculating the distribution density of the characteristic points of each subarea according to the number of the characteristic points and the distribution of the characteristic points, and determining that the image is an image to be identified meeting a preset clear condition when the distribution density of a preset number of subareas in one image is greater than a density threshold value, wherein the characteristic points are first pixel points of a first gray value, and the difference between the first gray value and a second gray value of a second pixel point adjacent to the first pixel point meets a preset difference condition;

the acquisition unit is further used for transmitting the image to be identified to a background proxy server in the data transmission unit;

A module for performing the steps of: transmitting the AR information to the AR engine in the data transmission unit;

a tracking unit for locating, in the AR engine, a first location parameter of the target object in a previous image frame in the video stream; expanding a preset pixel outwards at the edge position corresponding to the current image frame of the video by using the first position parameter to serve as a search area; locating a second position parameter of the target object in the current image frame by matching the search area with the image of the target object in the previous image frame;

a module for performing the steps of: transmitting the second location parameter to the data transmission unit when the AR information is received in the AR engine; in the data transmission unit, updating the position information of the target object, and transmitting the AR information and the second position parameter to the UI unit;

the display unit is further configured to superimpose and display, in the UI unit, the AR information in the current image frame according to the second position parameter, where the AR information is displayed adjacent to a corresponding target object.

6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,

the tracking unit is specifically configured to determine a reference point of the target object in a current frame based on tracking of the target object in a previous image frame, where the reference point is a pixel point representing a display position of the target object in the previous image frame; determining an offset vector of each feature point in the current image frame relative to the reference point; determining a mean shift vector of each of the feature points relative to the reference point based on the shift vector, wherein the mean shift vector comprises: mean shift direction and mean shift amount; positioning a target point corresponding to the target object based on the reference point and the mean shift vector; the target point is a reference point of the next image frame and corresponds to the second position parameter.

7. The apparatus of claim 5, wherein the device comprises a plurality of sensors,

8. The apparatus of claim 5, wherein the device comprises a plurality of sensors,

the display unit is further configured to display an acquisition prompt on a frame of the video in a process of acquiring the AR information, where the acquisition prompt is configured to prompt that the AR information is currently being acquired.

9. A display terminal, characterized by comprising:

the display is used for displaying information;

a memory for storing a computer program;

a processor, connected to the display and the memory, for controlling the display terminal to execute the AR processing method according to any one of claims 1 to 4 by executing the computer program.

10. A computer storage medium having a computer program stored therein; the computer program, when executed by a processor, is configured to implement the AR processing method according to any one of claims 1 to 4.