CN110222235A - 3 D stereo content display method, device, equipment and storage medium - Google Patents

3 D stereo content display method, device, equipment and storage medium Download PDF

Info

Publication number
CN110222235A
CN110222235A CN201910504980.1A CN201910504980A CN110222235A CN 110222235 A CN110222235 A CN 110222235A CN 201910504980 A CN201910504980 A CN 201910504980A CN 110222235 A CN110222235 A CN 110222235A
Authority
CN
China
Prior art keywords
video
prediction model
module
video frame
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910504980.1A
Other languages
Chinese (zh)
Inventor
杨茗名
王群
张苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910504980.1A priority Critical patent/CN110222235A/en
Publication of CN110222235A publication Critical patent/CN110222235A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the present invention proposes a kind of 3 D stereo content display method, device, equipment and storage medium, and wherein method includes: to obtain the video requency frame data of the first video frame during playing video file;First video frame be showing or predetermined time length after the video frame that will show;The video requency frame data is predicted using object prediction model;The object in first video frame is identified according to prediction result, determines the identification information of the object in first video frame;Using the identification information of the object, judge the object in first video frame with the presence or absence of associated 3 D stereo content;There are the associated 3 D stereo content, the predeterminated position of the object in first video frame shows the first cue mark.The embodiment of the present invention can provide search more abundant for user and recommend experience.

Description

3 D stereo content display method, device, equipment and storage medium
Technical field
The present invention relates to video display technology field more particularly to a kind of 3 D stereo content display methods, device, equipment And storage medium.
Background technique
With the development of internet technology, user obtains knowledge according to video content during watching video file Scene is more and more, and will be very important by the information displaying that video content carries out knowledge augmented and various dimensions.But It is that there is no be the skill of the relevant information of object in user's active push video frame during playing video file at present Art does not obtain and pushes the technology of 3 D stereo content actively especially.Therefore, existing video display arts can not be use Family provides search more abundant and recommends experience.
Summary of the invention
The embodiment of the present invention provides a kind of 3 D stereo content display method and device, in the prior art at least to solve The above technical problem.
In a first aspect, the embodiment of the invention provides a kind of 3 D stereo content display methods, comprising:
During playing video file, the video requency frame data of the first video frame is obtained;First video frame is positive The video frame that will be shown after display or predetermined time length;
The video requency frame data is predicted using object prediction model;First video is identified according to prediction result Object in frame determines the identification information of the object in first video frame;
Using the identification information of the object, judge the object in first video frame with the presence or absence of associated three-dimensional vertical Hold in vivo;
There are the associated 3 D stereo content, the default position of the object in first video frame Set the first cue mark of display.
In one embodiment, after the use object prediction model predicts the video requency frame data, also It include: the coordinate information that the object in first video frame is identified according to prediction result;
The predeterminated position of the object in first video frame shows the first cue mark, comprising: described The position that the coordinate information of object in one video frame determines shows the first cue mark.
In one embodiment, further includes:
Receive the first instruction for showing the associated 3 D stereo content;
According to first instruction, obtains and show the associated 3 D stereo content.
It is in one embodiment, described that the video requency frame data is predicted using object prediction model, comprising:
The subject categories for obtaining the video file determine the corresponding object prediction model of the subject categories;
The video requency frame data is predicted using the subject categories corresponding object prediction model.
In one embodiment, first instruction is the click event for first cue mark;
It is described to be instructed according to described first, it obtains and shows the associated 3 D stereo content, comprising: from described first The association contents storage address of object corresponding to cue mark reads the associated 3 D stereo content;Pause plays the view Frequency file generates floating layer on the video pictures after pause, the associated 3 D stereo content is shown in the floating layer.
In one embodiment, it is described the associated 3 D stereo content is shown in the floating layer after, also Include:
The second instruction that transformation shows the associated 3 D stereo content is received, according to second instruction described floating Transformation is carried out to the associated 3 D stereo content on layer to show;During the transformation display includes rotation, amplification and reduces At least one of;
And/or the third instruction for stopping showing the associated 3 D stereo content is received, close the associated three-dimensional Stereo content and the floating layer continue to play the video file.
Second aspect, the embodiment of the present invention propose a kind of training method of object prediction model, the object prediction model Object in picture for identification, which comprises
Image data is obtained, and obtains the actual identification information and actual coordinate information of the object in the picture;
The image data is inputted into the object prediction model;
The prediction identification information that the object prediction model exports is compared with the actual identification information, and by institute The prediction coordinate information for stating the output of object prediction model is compared with the actual coordinate information, according to the comparison result tune The parameter of the whole object prediction model.
The third aspect, the embodiment of the present invention propose a kind of 3 D stereo content display, comprising:
Video playback module, for being sent to the video requency frame data of the first video frame during playing video file Prediction module;First video frame be showing or predetermined time length after the video frame that will show;
The prediction module, for obtaining object prediction model from model service module, using the object prediction model The video requency frame data is predicted;The object in first video frame is identified according to prediction result, determines described first The identification information of object in video frame;The identification information is sent to associative content retrieval module;
The model service module, for providing object prediction model for the prediction module;
The associative content retrieval module is closed for retrieving the object in first video frame using the identification information The 3 D stereo content of connection, and the object in first video frame is incited somebody to action there are in the case where associated 3 D stereo content Search result is sent to label display module;
The label display module, for there are associated 3 D stereo content, in first video The predeterminated position of object in frame shows the first cue mark.In one embodiment, described device further includes association content Display module;
The label display module is also used to receive the first instruction of the display associated 3 D stereo content, and will First instruction is sent to the association content display module;
The association content display module, for obtaining the object in first video frame according to first instruction Associated 3 D stereo content;Show the associated 3 D stereo content of object in first video frame.
In one embodiment, the prediction module is used for, and obtains the subject categories of the video file, by the master Topic classification is sent to the model service module, receives the corresponding object of the subject categories of the model service module feedback Prediction model;
The model service module includes:
Model training submodule, for training the corresponding object prediction model of different themes classification, by the different themes The corresponding object prediction model of classification is sent to model and provides submodule;
The model provides submodule, for receiving and saving the corresponding object prediction model of the different themes classification; The subject categories for receiving the video file from the prediction module, identify the corresponding object prediction model of the subject categories, And the object prediction model that will identify that returns to the prediction module.
In one embodiment, the prediction module is also used to, and determines the coordinate of the object in first video frame The coordinate information is sent to associative content retrieval module by information;
The associative content retrieval module is also used to, and there are associated 3 D stereos for the object in first video frame In the case where content, the coordinate information and association contents storage address are sent to the label display module;
The label display module, for showing the first cue mark in the position that the coordinate information determines.
In one embodiment, first instruction is the click event for first cue mark;
The label display module is also used to, by the association contents storage address of object corresponding to first cue mark It is sent to the association content display module;
The association content display module, for being read in first video frame from the association contents storage address The associated 3 D stereo content of object;Pause plays the video file, floating layer is generated on the video pictures after pause, by institute Associated 3 D stereo content is stated to show in the floating layer.
In one embodiment, the association content display module is also used to:
The second instruction that transformation shows the associated 3 D stereo content is received, according to second instruction described floating Transformation is carried out to the associated 3 D stereo content on layer to show;During the transformation display includes rotation, amplification and reduces At least one of;
Alternatively, receiving the third instruction for stopping showing the associated 3 D stereo content, the associated three-dimensional is closed Stereo content and the floating layer continue to play the video file.
Fourth aspect, the embodiment of the present invention propose a kind of training device of object prediction model, the object prediction model The object in picture, described device include: for identification
Data acquisition module, for obtaining image data, and obtain object in the picture actual identification information and Actual coordinate information;
Input module, for the image data to be inputted the object prediction model;
Parameter adjustment module, prediction identification information and the actual identification for exporting the object prediction model are believed Breath is compared, and the prediction coordinate information that the object prediction model exports is compared with the actual coordinate information, The parameter of the object prediction model is adjusted according to the comparison result.
5th aspect, the embodiment of the invention provides a kind of 3 D stereo content display apparatus, and the function of the equipment can Corresponding software realization can also be executed by hardware by hardware realization.The hardware or software include one or more Module corresponding with above-mentioned function.
It include processor and storage in the structure of the 3 D stereo content display apparatus in a possible design Device, the memory are used to store the program for supporting the equipment to execute above-mentioned 3 D stereo content display method, the processing Device is configurable for executing the program stored in the memory.The equipment can also include communication interface, be used for and its His equipment or communication.
6th aspect, the embodiment of the invention provides a kind of training equipment of object prediction model, the functions of the equipment Corresponding software realization can also be executed by hardware by hardware realization.The hardware or software include one or more A module corresponding with above-mentioned function.
It include processor and storage in the structure of the training equipment of the object prediction model in a possible design Device, the memory are used to store the program for the training method for supporting the equipment to execute above-mentioned object prediction model, the place Reason device is configurable for executing the program stored in the memory.The equipment can also include communication interface, for Other equipment or communication.
7th aspect, the embodiment of the invention provides a kind of computer readable storage mediums, for storing in 3 D stereo Hold computer software instructions used in display equipment or the training equipment of object prediction model comprising for executing above-mentioned three-dimensional Program involved in stereo content display methods or the training method of object prediction model.
A technical solution in above-mentioned technical proposal have the following advantages that or the utility model has the advantages that
The embodiment of the present invention can be during playing video file, associated the three of object in initiative recognition video frame Stereo content is tieed up, and prompts user, so that providing search more abundant for user recommends experience.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.
Detailed description of the invention
In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention Disclosed some embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 is a kind of 3 D stereo content display method implementation flow chart one of the embodiment of the present invention;
Fig. 2 is a kind of 3 D stereo content display method implementation flow chart two of the embodiment of the present invention;
Fig. 3 A is in a kind of 3 D stereo content display method of the embodiment of the present invention, and the display of the first cue mark is illustrated Figure;
Fig. 3 B is in a kind of 3 D stereo content display method of the embodiment of the present invention, and the display of 3 D stereo content is illustrated Figure;
Fig. 3 C is the rotational display of 3 D stereo content in a kind of 3 D stereo content display method of the embodiment of the present invention Schematic diagram;
Fig. 4 is to carry out in step S12 to video data frame pre- in a kind of three-dimensional content display methods of the embodiment of the present invention The implementation schematic diagram of survey;
Fig. 5 is a kind of training method implementation flow chart of object prediction model of the embodiment of the present invention;
Fig. 6 is a kind of 3 D stereo content display structural schematic diagram one of the embodiment of the present invention;
Fig. 7 is a kind of 3 D stereo content display structural schematic diagram two of the embodiment of the present invention;
Fig. 8 is that the information between the structure and each module of a kind of 3 D stereo content display of the embodiment of the present invention passes Defeated schematic diagram;
Fig. 9 is a kind of training device structural schematic diagram of object prediction model of the embodiment of the present invention;
Figure 10 is the 3 D stereo content display apparatus of the embodiment of the present invention or the training device structure of object prediction model Schematic diagram.
Specific embodiment
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.
The embodiment of the present invention mainly provides 3 D stereo content display method and device, passes through following implementation separately below Example carries out the expansion description of technical solution.
Fig. 1 is a kind of 3 D stereo content display method implementation flow chart one of the embodiment of the present invention, comprising:
S11: during playing video file, the video requency frame data of the first video frame is obtained;First video frame For showing or predetermined time length after the video frame that will show;
S12: the video requency frame data is predicted using object prediction model;According to prediction result identification described first Object in video frame determines the identification information of the object in first video frame;
S13: using the identification information of the object, judge the object in first video frame with the presence or absence of associated three Tie up stereo content;
S14: there are the associated 3 D stereo content, object in first video frame it is pre- If position shows the first cue mark.
If the first video frame in above-mentioned steps S11 is the video frame that will be shown after predetermined time length, There are in the case where associated 3 D stereo content, step S14 shown simultaneously at the time of showing first video frame this first Cue mark.For example, predetermined time length is t if current time is T0, (T0+t) moment is judged using step S11 There are associated 3 D stereo contents for an object in video frame F.So, the first prompt mark will be shown at (T0+t) moment Note, at the time of (T0+t) moment i.e. aforementioned video frame F are being shown.This ensure that object and the first cue mark energy It enough synchronizes shown.
For example, the associated 3 D stereo content of the building can be the 3D of the building for a building in video frame Model.
In a kind of possible embodiment, first cue mark can there are the labels of 3D content for expression. First cue mark can use icon iconic format.The embodiment of the present invention can hang on the object in above-mentioned first video frame It is floating to show first cue mark, to indicate that there are the associated 3 D stereo contents of the object.
Fig. 2 is a kind of 3 D stereo content display method implementation flow chart two of the embodiment of the present invention, comprising:
S11: during playing video file, the video requency frame data of the first video frame is obtained;First video frame For showing or predetermined time length after the video frame that will show;
S12: the video requency frame data is predicted using object prediction model;According to prediction result identification described first Object in video frame determines the identification information of the object in first video frame;
S12: using the identification information of the object, judge the object in first video frame with the presence or absence of associated three Tie up stereo content;
S14: there are the associated 3 D stereo content, object in first video frame it is pre- If position shows the first cue mark;
S25: the first instruction for showing the associated 3 D stereo content is received;
S26: it according to first instruction, obtains and shows the associated 3 D stereo content.
Fig. 3 A is in a kind of 3 D stereo content display method of the embodiment of the present invention, and the display of the first cue mark is illustrated Figure.The picture shown in figure 3 a is above-mentioned first video frame, by judgement, determines that the building of the picture center position exists and closes The 3 D stereo content of connection.It suspends in the upper right corner of the building and shows a circular mark, have " 3D " printed words, the icon in circle As above-mentioned first cue mark, for prompting user's building, there are associated 3 D stereo contents.User click this first Cue mark, it can pause display video file obtains and shows the associated 3 D stereo content of the building.
Fig. 3 B is in a kind of 3 D stereo content display method of the embodiment of the present invention, and the display of 3 D stereo content is illustrated Figure.The embodiment of the present invention generates floating layer on video pictures, by the rendering of 3 D stereo content in floating layer.User can click Or by touch screen, the interactive operations such as rotated, amplified, being reduced, to 3 D stereo content to understand three-dimensional stereoscopic visual object Body knowledge.
Fig. 3 C is the rotational display of 3 D stereo content in a kind of 3 D stereo content display method of the embodiment of the present invention Schematic diagram.As shown in Figure 3 C, user has carried out the rotation of certain angle to 3 D stereo content, so as to more clearly from different angles Degree understands corresponding building.In Fig. 3 B and Fig. 3 C, show that the upper right corner of picture has the label of " closing " printed words, Yong Huke To click the label at any time, to close 3 D stereo content.After closing, playing video file will be continued.
Fig. 4 is to carry out in step S12 to video data frame pre- in a kind of three-dimensional content display methods of the embodiment of the present invention The implementation schematic diagram of survey, comprising:
S121: obtaining the subject categories of the video file, determines the corresponding object prediction model of the subject categories;
S122: the video requency frame data is predicted using the subject categories corresponding object prediction model.
In a kind of possible embodiment, using object prediction model to the video requency frame data in above-mentioned steps S12 After being predicted, the coordinate information of the object in first video frame can also be further identified according to prediction result.It should Coordinate information refers to coordinate information of the object in the first video frame, not the coordinate information of the object true physical location.
Correspondingly, above-mentioned steps S14 can show the first cue mark in the position that the coordinate information determines.
In a kind of possible embodiment, first instruction is the click event for first cue mark;
Above-mentioned steps S26 can specifically include: the association content of the object corresponding to first cue mark stores ground Read the associated 3 D stereo content in location;Pause plays the video file, generates on the video pictures after pause floating Layer, the associated 3 D stereo content is shown in the floating layer.
As shown in Fig. 2, may further include step S27 and step S28 after above-mentioned steps S26.Alternatively, above-mentioned step After rapid S26, step S28 may further include.
S27: the second instruction that transformation shows the associated 3 D stereo content is received, according to second instruction in institute It states and transformation is carried out to the associated 3 D stereo content in floating layer shows;The transformation display includes rotation, amplification and reduces At least one of in.Wherein, above-mentioned second instruction can carry out 3 D stereo content by clicking or touching screen for user Interactive operation.
S28: receiving the third instruction for stopping showing the associated 3 D stereo content, closes described associated three-dimensional vertical Appearance and the floating layer in vivo, continue to play the video file.Wherein, above-mentioned third instruction can mark " closing " for user Click event.
The embodiment of the present invention also proposes a kind of training method of object prediction model, and the object prediction model is for identification Object in picture.Fig. 5 is a kind of training method implementation flow chart of object prediction model of the embodiment of the present invention, comprising:
S51: image data is obtained, and obtains the actual identification information and actual coordinate information of the object in the picture;
S52: the image data is inputted into the object prediction model;
S53: the prediction identification information that the object prediction model exports is compared with the actual identification information, and The prediction coordinate information that the object prediction model exports is compared with the actual coordinate information, is relatively tied according to described Fruit adjusts the parameter of the object prediction model.
Wherein, above-mentioned picture can be video frame, and above-mentioned image data can be video requency frame data.
The embodiment of the present invention also proposes a kind of 3 D stereo content display.It is the embodiment of the present invention referring to Fig. 6, Fig. 6 A kind of 3 D stereo content display structural schematic diagram one, comprising:
Video playback module 610, for sending out the video requency frame data of the first video frame during playing video file It send to prediction module 620;First video frame be showing or predetermined time length after the video frame that will show;
Prediction module 620 predicts mould using the object for obtaining object prediction model from model service module 630 Type predicts the video requency frame data;The object in first video frame is identified according to prediction result, determines described The identification information of object in one video frame;The identification information is sent to associative content retrieval module 640;
Model service module 630, for providing object prediction model for the prediction module;
Associative content retrieval module 640 is closed for retrieving the object in first video frame using the identification information The 3 D stereo content of connection, and the object in first video frame is incited somebody to action there are in the case where associated 3 D stereo content Search result is sent to label display module 650;
Display module 650 is marked, for there are associated 3 D stereo content, in first video frame In the predeterminated position of object show the first cue mark.
In a kind of possible embodiment, above-mentioned apparatus further includes association content display module.Fig. 7 is that the present invention is implemented A kind of 3 D stereo content display structural schematic diagram two of example.As shown in fig. 7, the 3 D stereo content of the embodiment of the present invention Display device includes:
Video playback module 610, prediction module 620, model service module 630, associative content retrieval module 640, label Display module 650 be associated with content display module 760.
In a kind of possible embodiment, display module 650 is marked, it is described associated three-dimensional vertical to be also used to receive display The first instruction held in vivo, and first instruction is sent to the association content display module 760;
It is associated with content display module 760, for according to first instruction, the object obtained in first video frame to be closed The 3 D stereo content of connection;Show the associated 3 D stereo content of object in first video frame.
In a kind of possible embodiment, prediction module 620 is specifically used for, and obtains the theme class of the video file Not, the subject categories are sent to the model service module 630, receive the described of the feedback of model service module 630 The corresponding object prediction model of subject categories.Wherein, the subject categories of video file can be by being manually set, or passes through video The acquisition of information such as title, the content introduction of file.
The model service module 630 includes:
Model training submodule 631 will be described different main for training the corresponding object prediction model of different themes classification The corresponding object prediction model of topic classification is sent to model and provides submodule 632;
The model provides submodule 632, for receiving and saving the corresponding object prediction mould of the different themes classification Type;The subject categories for receiving the video file from the prediction module 620 identify the corresponding object prediction of the subject categories Model, and the object prediction model that will identify that returns to the prediction module 620.
For example, above-mentioned different subject categories may include: the types such as colleges and universities' classification, tourist attractions classification.Model training Submodule 631 can be directed to the object prediction model of different themes classification with off-line training.For example, obtaining multiple colleges and universities' buildings not It is artificial to mark the identification information of building and the coordinate information in picture in picture with the picture of angle.Wherein, above-mentioned mark Information can refer to sequence number.Object prediction model using these pictures as training sample, to the corresponding colleges and universities' classification of training. Aforementioned picture can be the video frame extracted from video file.
The input content for the object prediction model that final training is completed is video frame, and output content is the object in video frame Sequence number and coordinate information.Prediction classifying dictionary can also be arranged in the embodiment of the present invention, predict to include object in classifying dictionary Sequence number and object title corresponding relationship.Such as use following form:
1 --- Tsinghua University's main entrance;
2 --- Library of Tsinghua University;
……
The object prediction model that training is completed can be stored in model and provide in the database of submodule 632.One object The data of prediction model may include:
1) JSON file is described.For describing the model frameworks information such as the number of plies of object prediction model, every layer of function.
2) the supplemental characteristic binary file (bin file) of multiple neural net layers.It is every for describing object prediction model The design parameter of layer.
3) classifying dictionary is predicted.The corresponding relationship of the title of sequence number and object including object.
In a kind of possible embodiment, prediction module 620 is also used to, and determines the object in first video frame The coordinate information is sent to associative content retrieval module 640 by coordinate information;
The associative content retrieval module 640 is also used to, and there are associated three-dimensionals for the object in first video frame In the case where stereo content, the coordinate information and association contents storage address are sent to the label display module 650;
The label display module 650, specifically for showing the first cue mark in the position that the coordinate information determines.
In a kind of possible embodiment, first instruction is the click event for first cue mark;
The label display module 650 is also used to, and the association content of object corresponding to first cue mark is stored Address is sent to the association content display module 760;
The association content display module 760, for reading first video frame from the association contents storage address In the associated 3 D stereo content of object;Pause plays the video file, generates floating layer on the video pictures after pause, The associated 3 D stereo content is shown in the floating layer.
In a kind of possible embodiment, the association content display module 760 is also used to:
The second instruction that transformation shows the associated 3 D stereo content is received, according to second instruction described floating Transformation is carried out to the associated 3 D stereo content on layer to show;During the transformation display includes rotation, amplification and reduces At least one of;
Alternatively, receiving the third instruction for stopping showing the associated 3 D stereo content, the associated three-dimensional is closed Stereo content and the floating layer continue to play the video file.
Referring to the information transmission between the structure and each module of 3 D stereo content display shown in Fig. 8, it is situated between The device that continues shows the specific implementation of associated 3 D stereo content.Include:
S81: the video requency frame data of video frame is sent to by video playback module 610 during playing video file Prediction module 620 and label display module 650.
S82: prediction module 620 obtains the subject categories of video file, and subject categories are sent to model and provide submodule 632。
S83: model provides submodule 632 according to the subject categories of the video file, identifies corresponding object prediction model, And the object prediction model that will identify that returns to prediction module 620.Wherein, the data of object prediction model may include: description JSON file, the supplemental characteristic bin file of multiple neural net layers and prediction classifying dictionary.
S84: prediction module 620 can carry out the online end Web prediction.Specifically, the embodiment of the present invention can run root According to the end the Web deep learning frame that deep learning Development Framework fluid is developed, required object prediction model is obtained.Operation should Object prediction model determines the coordinate information of the title of object and object in the video frame in video frame.By the knot of aforementioned determination Fruit is sent to associative content retrieval module 640.It is aforementioned send content data format can name: ' ', location:{ x: ' ', y: ' ' } }.
Specific implementation are as follows: the deep learning Development Framework at the end Web passes through Web graph shape program library (WebGL, Web Graphics Library) agreement calling graphics processor (GPU, GraphicsProcessing Unit) unlatching parallel computation Ability requests required object prediction model.Acquisition is used in piece member tinter (fragment shader) link of GPU Prediction model carries out operation.The input data of GPU is the parameter number of multiple neural net layers of video requency frame data and prediction model According to bin file, the coordinate of sequence number and object that result is prediction model output is exported.Later, it is searched according to the sequence number It predicts classifying dictionary file, obtains object names.
S85: associative content retrieval module 640 is searched in the corresponding association of the object names according to the object names received Hold storage address, aforementioned association contents storage address is specifically as follows the online address of 3D resource.If 3D resource can be retrieved Online address then receives the coordinate of the title and object of object in the video frame by the online address of 3D resource and before Information is sent to label display module 650.If retrieval is less than can directly exist the title of the object received and object Coordinate information in video frame is forwarded to label display module 650, or does not send any information.
S86: label display module 650 is specifically as follows 3D label generator.3D marks generator content based on the received Creation in real time updates or destroys 3D label.If it includes the online address of 3D resource that 3D, which is marked in the received information of generator, show There are associated 3D contents for above-mentioned object.In this case, 3D marks generator creation 3D label, and according to the coordinate received Designated position of the information by 3D label display in above-mentioned video frame.
S87: when 3D is marked when the user clicks, label display module 650 is online by the click event and above-mentioned 3D resource Address is sent to association content display module 760.
S88: association content display module 760 is specifically as follows 3D content renderer.When receiving above-mentioned click event, 3D content renderer indicates that video playback module 610 suspends playing video file.And it is obtained in 3D from the online address of 3D resource Hold, generates floating layer on the video pictures after pause, 3D content is rendered into floating layer.Later, it is associated with content display module 760 receive the transformation idsplay order to 3D content, and are rotated according to transformation idsplay order to 3D content, amplify, reduce. Association content display module 760 can also receive the instruction for stopping display 3D content, close 3D content according to the instruction, and indicate Video playback module 610 continues playing video file.
So far, the 3 D stereo content display that the embodiment of the present invention proposes completes in the association 3D to video file The primary display process held.
Above-mentioned 3 D stereo content display method can be realized using 3 D stereo content display.It is detailed in this method Thin implementation is referred to the interactive form of information between the function of each module and each module in device, no longer superfluous herein It states.
To sum up, 3 D stereo content display method and device that the embodiment of the present invention proposes, can be in video file During broadcasting, the object in the first video frame of initiative recognition whether there is associated 3 D stereo content, and existing In the case of prompt is made to user.Further, when receiving user and showing the instruction of the associated 3 D stereo content, display The 3 D stereo content.
The embodiment of the present invention also proposes a kind of training device of object prediction model, and the object prediction model is for identification Object in picture, if Fig. 9 is the apparatus structure schematic diagram, comprising:
Data acquisition module 910 for obtaining image data, and obtains the actual identification information of the object in the picture And actual coordinate information;
Input module 920, for the image data to be inputted the object prediction model;
Parameter adjustment module 930, prediction identification information and the practical mark for exporting the object prediction model Know information to be compared, and the prediction coordinate information that the object prediction model exports is compared with the actual coordinate information Compared with adjusting the parameter of the object prediction model according to the comparison result.
The embodiment of the present invention also proposes a kind of 3 D stereo content display apparatus and a kind of training of object prediction model Equipment, as the training device structure of 3 D stereo content display apparatus or object prediction model that Figure 10 is the embodiment of the present invention shows It is intended to, comprising:
Memory 11 and processor 12, memory 11 are stored with the computer program that can be run on the processor 12.It is described Processor 12 realizes 3 D stereo content display method or object prediction in above-described embodiment when executing the computer program The training method of model.The quantity of the memory 11 and processor 12 can be one or more.
The equipment can also include:
Communication interface 13 carries out data exchange transmission for being communicated with external device.
Memory 11 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-volatile Memory), a for example, at least magnetic disk storage.
If memory 11, processor 12 and the independent realization of communication interface 13, memory 11, processor 12 and communication are connect Mouth 13 can be connected with each other by bus and complete mutual communication.The bus can be industry standard architecture (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component Interconnect) bus or extended industry-standard architecture (EISA, Extended Industry Standard Architecture) etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for It indicates, is only indicated with a thick line in Figure 10, be not offered as only a bus or a type of bus.
Optionally, in specific implementation, if memory 11, processor 12 and communication interface 13 are integrated in chip piece On, then memory 11, processor 12 and communication interface 13 can complete mutual communication by internal interface.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie Matter, because can then be edited, be interpreted or when necessary with other for example by carrying out optical scanner to paper or other media Suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims (17)

1. a kind of 3 D stereo content display method characterized by comprising
During playing video file, the video requency frame data of the first video frame is obtained;First video frame is to show Show or predetermined time length after the video frame that will show;
The video requency frame data is predicted using object prediction model;It is identified in first video frame according to prediction result Object, determine the identification information of the object in first video frame;
Using the identification information of the object, judge the object in first video frame with the presence or absence of in associated 3 D stereo Hold;
There are the associated 3 D stereo content, the predeterminated position of the object in first video frame is aobvious Show the first cue mark.
2. the method according to claim 1, wherein described use object prediction model to the video requency frame data After being predicted, further includes: identify the coordinate information of the object in first video frame according to prediction result;
The predeterminated position of the object in first video frame shows the first cue mark, comprising: in first view The position that the coordinate information of object in frequency frame determines shows the first cue mark.
3. method according to claim 1 or 2, which is characterized in that further include:
Receive the first instruction for showing the associated 3 D stereo content;
According to first instruction, obtains and show the associated 3 D stereo content.
4. method according to claim 1 or 2, which is characterized in that described to use object prediction model to the video frame Data are predicted, comprising:
The subject categories for obtaining the video file determine the corresponding object prediction model of the subject categories;
The video requency frame data is predicted using the subject categories corresponding object prediction model.
5. according to the method described in claim 3, it is characterized in that, first instruction is for first cue mark Click event;
It is described to be instructed according to described first, it obtains and shows the associated 3 D stereo content, comprising: from first prompt The association contents storage address of corresponding object is marked to read the associated 3 D stereo content;Pause plays the video text Part generates floating layer on the video pictures after pause, the associated 3 D stereo content is shown in the floating layer.
6. according to the method described in claim 5, it is characterized in that, described float the associated 3 D stereo content described After being shown on layer, further includes:
The second instruction that transformation shows the associated 3 D stereo content is received, according to second instruction in the floating layer Transformation is carried out to the associated 3 D stereo content to show;Transformation display include rotation, amplification and reduce at least One;
And/or the third instruction for stopping showing the associated 3 D stereo content is received, close the associated 3 D stereo Content and the floating layer continue to play the video file.
7. a kind of training method of object prediction model, which is characterized in that the object prediction model is for identification in picture Object, which comprises
Image data is obtained, and obtains the actual identification information and actual coordinate information of the object in the picture;
The image data is inputted into the object prediction model;
The prediction identification information that the object prediction model exports is compared with the actual identification information, and by the object The prediction coordinate information of body prediction model output is compared with the actual coordinate information, adjusts institute according to the comparison result State the parameter of object prediction model.
8. a kind of 3 D stereo content display characterized by comprising
Video playback module, for the video requency frame data of the first video frame to be sent to prediction during playing video file Module;First video frame be showing or predetermined time length after the video frame that will show;
The prediction module, for obtaining object prediction model from model service module, using the object prediction model to institute Video requency frame data is stated to be predicted;The object in first video frame is identified according to prediction result, determines first video The identification information of object in frame;The identification information is sent to associative content retrieval module;
The model service module, for providing object prediction model for the prediction module;
The associative content retrieval module, the object for being retrieved in first video frame using the identification information are associated 3 D stereo content, and the object in first video frame will be retrieved there are in the case where associated 3 D stereo content As a result it is sent to label display module;
The label display module, for there are associated 3 D stereo content, in first video frame The predeterminated position of object show the first cue mark.
9. device according to claim 8, which is characterized in that described device further includes association content display module;
The label display module is also used to receive the first instruction of the display associated 3 D stereo content, and will be described First instruction is sent to the association content display module;
The association content display module, for obtaining the object association in first video frame according to first instruction 3 D stereo content;Show the associated 3 D stereo content of object in first video frame.
10. device according to claim 9, which is characterized in that the prediction module is used for, and obtains the video file The subject categories are sent to the model service module, receive the described of the model service module feedback by subject categories The corresponding object prediction model of subject categories;
The model service module includes:
Model training submodule, for training the corresponding object prediction model of different themes classification, by the different themes classification Corresponding object prediction model is sent to model and provides submodule;
The model provides submodule, for receiving and saving the corresponding object prediction model of the different themes classification;It receives The subject categories of video file from the prediction module identify the corresponding object prediction model of the subject categories, and will The object prediction model identified returns to the prediction module.
11. device according to claim 10, which is characterized in that the prediction module is also used to, and determines first view The coordinate information is sent to associative content retrieval module by the coordinate information of the object in frequency frame;
The associative content retrieval module is also used to, and there are associated 3 D stereo contents for the object in first video frame In the case where, the coordinate information and association contents storage address are sent to the label display module;
The label display module, for showing the first cue mark in the position that the coordinate information determines.
12. device according to claim 11, which is characterized in that first instruction is for first cue mark Click event;
The label display module is also used to, and the association contents storage address of object corresponding to first cue mark is sent To the association content display module;
The association content display module, for reading the object in first video frame from the association contents storage address Associated 3 D stereo content;Pause plays the video file, floating layer is generated on the video pictures after pause, by the pass The 3 D stereo content of connection is shown in the floating layer.
13. device according to claim 12, which is characterized in that the association content display module is also used to:
The second instruction that transformation shows the associated 3 D stereo content is received, according to second instruction in the floating layer Transformation is carried out to the associated 3 D stereo content to show;Transformation display include rotation, amplification and reduce at least One;
Alternatively, receiving the third instruction for stopping showing the associated 3 D stereo content, the associated 3 D stereo is closed Content and the floating layer continue to play the video file.
14. a kind of training device of object prediction model, which is characterized in that the object prediction model is for identification in picture Object, described device include:
Data acquisition module for obtaining image data, and obtains the actual identification information and reality of the object in the picture Coordinate information;
Input module, for the image data to be inputted the object prediction model;
Parameter adjustment module, prediction identification information for exporting the object prediction model and the actual identification information into Row compares, and the prediction coordinate information that the object prediction model exports is compared with the actual coordinate information, according to The comparison result adjusts the parameter of the object prediction model.
15. a kind of 3 D stereo content display apparatus, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors Realize such as method as claimed in any one of claims 1 to 6.
16. a kind of training equipment of object prediction model, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors It realizes the method for claim 7.
17. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor The method as described in any in claim 1-7 is realized when row.
CN201910504980.1A 2019-06-11 2019-06-11 3 D stereo content display method, device, equipment and storage medium Pending CN110222235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910504980.1A CN110222235A (en) 2019-06-11 2019-06-11 3 D stereo content display method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910504980.1A CN110222235A (en) 2019-06-11 2019-06-11 3 D stereo content display method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110222235A true CN110222235A (en) 2019-09-10

Family

ID=67816522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910504980.1A Pending CN110222235A (en) 2019-06-11 2019-06-11 3 D stereo content display method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110222235A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118200663A (en) * 2024-03-15 2024-06-14 宁波艾腾湃数字技术有限公司 Interactive video playing method combined with digital three-dimensional model display

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102377875A (en) * 2010-08-18 2012-03-14 Lg电子株式会社 Mobile terminal and image display method thereof
CN107358490A (en) * 2017-06-19 2017-11-17 北京奇艺世纪科技有限公司 A kind of image matching method, device and electronic equipment
CN108462889A (en) * 2017-02-17 2018-08-28 阿里巴巴集团控股有限公司 Information recommendation method during live streaming and device
CN108475341A (en) * 2017-04-11 2018-08-31 深圳市柔宇科技有限公司 The recognition methods of 3-D view and terminal
US10095696B1 (en) * 2016-01-04 2018-10-09 Gopro, Inc. Systems and methods for generating recommendations of post-capture users to edit digital media content field
CN109313710A (en) * 2018-02-02 2019-02-05 深圳蓝胖子机器人有限公司 Model of Target Recognition training method, target identification method, equipment and robot

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102377875A (en) * 2010-08-18 2012-03-14 Lg电子株式会社 Mobile terminal and image display method thereof
US10095696B1 (en) * 2016-01-04 2018-10-09 Gopro, Inc. Systems and methods for generating recommendations of post-capture users to edit digital media content field
CN108462889A (en) * 2017-02-17 2018-08-28 阿里巴巴集团控股有限公司 Information recommendation method during live streaming and device
CN108475341A (en) * 2017-04-11 2018-08-31 深圳市柔宇科技有限公司 The recognition methods of 3-D view and terminal
CN107358490A (en) * 2017-06-19 2017-11-17 北京奇艺世纪科技有限公司 A kind of image matching method, device and electronic equipment
CN109313710A (en) * 2018-02-02 2019-02-05 深圳蓝胖子机器人有限公司 Model of Target Recognition training method, target identification method, equipment and robot

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118200663A (en) * 2024-03-15 2024-06-14 宁波艾腾湃数字技术有限公司 Interactive video playing method combined with digital three-dimensional model display

Similar Documents

Publication Publication Date Title
CN113767434B (en) Tagging video by correlating visual features with sound tags
Achlioptas et al. Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes
CN1821956B (en) Using existing content to generate active content wizard executables for execution of tasks
CN111444357B (en) Content information determination method, device, computer equipment and storage medium
KR101780034B1 (en) Generating augmented reality exemplars
CN108460396A (en) The negative method of sampling and device
US8046691B2 (en) Generalized interactive narratives
CN106980868A (en) Embedded space for the image with multiple text labels
US20050132305A1 (en) Electronic information access systems, methods for creation and related commercial models
CN106980867A (en) Semantic concept in embedded space is modeled as distribution
US11062081B2 (en) Creating accessible, translatable multimedia presentations
CN106469552A (en) Speech recognition apparatus and method
CN101783886A (en) Information processing apparatus, information processing method, and program
CN106255951B (en) It is shown using the content that dynamic zoom focuses
CN107103316A (en) Method and system based on smart mobile phone
CN110232340A (en) Establish the method, apparatus of video classification model and visual classification
CN102157006A (en) Method and apparatus for producing dynamic effect of character capable of interacting with image
CN109716275A (en) Based on personalized theme with multi-dimensional model come the method that shows image
Shin et al. The reality of the situation: A survey of situated analytics
CN115757692A (en) Data processing method and device
CN108847066A (en) A kind of content of courses reminding method, device, server and storage medium
CN110222235A (en) 3 D stereo content display method, device, equipment and storage medium
US20190172260A1 (en) System for composing or modifying virtual reality sequences, method of composing and system for reading said sequences
CN113822382B (en) Course classification method, device, equipment and medium based on multi-mode feature representation
CN113486260B (en) Method and device for generating interactive information, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination