WO2014205658A1 - 数据处理方法和数据处理系统 - Google Patents

数据处理方法和数据处理系统 Download PDF

Info

Publication number
WO2014205658A1
WO2014205658A1 PCT/CN2013/077929 CN2013077929W WO2014205658A1 WO 2014205658 A1 WO2014205658 A1 WO 2014205658A1 CN 2013077929 W CN2013077929 W CN 2013077929W WO 2014205658 A1 WO2014205658 A1 WO 2014205658A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
video
information
data
photographic subject
Prior art date
Application number
PCT/CN2013/077929
Other languages
English (en)
French (fr)
Inventor
黄伟
王奎
Original Assignee
东莞宇龙通信科技有限公司
宇龙计算机通信科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东莞宇龙通信科技有限公司, 宇龙计算机通信科技(深圳)有限公司 filed Critical 东莞宇龙通信科技有限公司
Priority to CN201380069016.1A priority Critical patent/CN104885113A/zh
Priority to EP13888374.9A priority patent/EP3016052A4/en
Priority to PCT/CN2013/077929 priority patent/WO2014205658A1/zh
Priority to US14/888,004 priority patent/US10255243B2/en
Publication of WO2014205658A1 publication Critical patent/WO2014205658A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to a data processing method and a data processing system. Background technique
  • the user cannot fully understand an object through the picture, but if the same object is described by video, especially by video capture of the object in the offline store, it will undoubtedly enhance the user's perception of the same object, which is beneficial to the user. Moreover, allowing the user to operate the object of interest while watching the video will greatly enhance the user's purchasing experience.
  • the invention is based on the above problems, and proposes a new data processing scheme, which can identify the shooting object in the video, so that the user can directly shoot the video while watching the video.
  • the object operates without having to perform operations through a separate network search, thereby facilitating the user operation and improving the user experience.
  • the present invention provides a data processing method, including: the first terminal performs image collection on at least one photographic subject entity, and encodes the collected image and identification information corresponding to at least one of the photographic subject entities. Forming video data, and transmitting the data to the second terminal through the network; the second terminal receives the video data, performs data separation on the video data, and obtains a video file and is associated with at least one object in the video file And the second terminal identifies at least one photographic subject in the video file according to the identification information, and forms an operation area corresponding to at least one of the photographic objects in the video file; The second terminal, when playing the video file, performs an operation function associated with the specified photographic subject corresponding to the specified operation area according to the detected operation action on the designated operation area.
  • the corresponding operation area generated by the recognition of the photographic subject in the video may be an area corresponding to the display edge of the photographic subject, or a rectangular area in which the photographic subject is included, etc., specifically, the operation area may be Transparent, can also be displayed under certain conditions (such as setting a video playback mode that can display the operating area and entering this mode). Since the video is dynamic, when the subject in the video moves (actively moving, or the relative position changes on the terminal screen due to the movement of the lens), the corresponding operating area should also correspond. The ground changes so that the user can directly operate the subject without paying special attention to the position of the operating area.
  • the video data may be acquired by the first terminal and transmitted to the second terminal, and may be acquired by the first terminal in real time and transmitted to the second terminal through the network.
  • the first terminal When the first terminal is in the process of capturing, acquiring the identification information of the captured subject, the first terminal encodes the captured video file into video data, thereby eliminating the need for the first terminal to analyze and acquire the captured object. , the requirement for the first terminal is reduced, and the second terminal is also convenient for recognizing the object in the video.
  • the method further includes: the first terminal receiving at least one The identification information corresponding to itself sent by the photographic subject entity is used for encoding into the video data.
  • the identification information may be obtained by the first terminal from the photographic subject entity, thereby facilitating establishing a physical association between the identification information and the specific photographic subject entity, facilitating execution of the photographic subject entity and corresponding Management of identification information.
  • the method further includes: the second terminal matching the content in the image frame of the video file with the pre-stored identification feature to identify at least one of the video files .
  • the identification feature of one or more objects is pre-stored in the cloud storage space corresponding to the second terminal or the second terminal, so that the video is played at any time after the second terminal acquires the video file.
  • the content in the image frame of the video is matched with the pre-stored identification feature to identify the subject in the video. Since the pre-stored identification feature is adopted, there is no special requirement for the video file itself, and all the video files can be applied to the technical solution, and the second terminal can be downloaded from the network and from other terminals.
  • the acquired or the second terminal itself has a strong versatility.
  • the pixel information in the image frame and one or more subsequent image frames may be The comparison is performed to determine whether a change of the subject occurs. If there is a change, the recognition may be performed. Otherwise, it is not necessary to identify again, which is advantageous for improving the recognition efficiency and reducing the requirement for the terminal processing capability.
  • the pre-stored identification feature in the case of the cartridge, may be an image of the object, and may be compared with the image in the video file to identify the object; further, the identification feature may also be some characteristic parameter. For example, for "clothing”, it may include parameters such as "there is an opening in the front, and there are symmetrical sleeves on the left and right", so that the second terminal can "know” to the "clothing", and the characteristics of the clothing itself, such as color, The size, style, etc., can realize the intelligent recognition of "clothing" by the second terminal.
  • the second terminal itself pre-stores the identification feature, and the identification information sent by the first terminal is not contradictory to each other.
  • the object recognition may be performed by using only one of them, or both may be used for identification.
  • the second terminal performs data on the video data.
  • the process of separating includes: parsing the video data, extracting an identification frame from the video data, and obtaining the video file remaining after extracting the identification frame; further extracting the identification information from the identification frame For the recognition operation of the video file.
  • an identification frame including identification information may be added in the middle or both ends of the data stream corresponding to the video file.
  • the header portion of the identification frame should include a type identifier for the second terminal to identify the type of the identification frame in the video data.
  • the data frame is determined.
  • the identification frame header is mainly composed of special characters to identify the identification frame.
  • the second terminal continues to parse other information such as the length of the identified frame to completely determine the corresponding identified frame.
  • the identification frame should also include an information portion containing identification information of the subject, etc., for identifying the subject in the video.
  • the identification information can be conveniently encoded in the video data, and the identification frame is conveniently parsed from the video data, and the identification information of the object is extracted from the information part of the identification frame, and the video is identified by the identification information.
  • the subject in the file is identified.
  • the method further includes: at least one of the first terminals as an upper node, and all of the photographic subject entities as lower nodes to form an Ad Hoc hierarchical network structure.
  • the Ad Hoc hierarchical network structure does not need to rely on the existing fixed communication network infrastructure, and can quickly deploy the network system used.
  • Each network node in the network cooperates with each other to communicate and exchange information through wireless links, thereby realizing the sharing of information and services.
  • Network nodes can enter and leave the network dynamically, randomly, and frequently, often without prior warning or notification, and without disrupting communication between other nodes in the network.
  • the first terminal may be a camera, the camera is used as an upper node of Ad Hoc, and a shooting object (such as clothes) is used as a lower node.
  • an upper node ie, a camera
  • the method further includes: the first terminal further receiving, by the at least one of the photographic subject entities, controllable information corresponding to itself; wherein, the first terminal Control information is encoded to the video data in association with the identification information, and The second terminal further acquires controllable information associated with the at least one of the photographic objects from the video data, and when the operation action on the specified operation area is detected, performing the pair according to the controllable information The operation function of the specified photographic subject; or when the second terminal detects an operation action on the specified operation area, and reports the detection result to the first terminal, the first terminal will correspond to the The controllable information specifying the operation area is transmitted to the second terminal, so that the second terminal performs an operation function on the designated photographic subject according to the controllable information.
  • the second terminal may be saved in the matching database together with the associated identification information, and when the user operates the identified photographic subject, the identification information of the specified object is retrieved from the matching database.
  • Associated controllable information to perform operational functions on the subject.
  • the first terminal when the first terminal encodes the controllable information into the video data, the identification information associated with the photographic subject in the video data and the controllable information are often sent to the second terminal together; however, in order to save network resources and improve The transmission speed of the video data, the first terminal may send the corresponding controllable information to the second terminal only when there is an operation action in the operation area corresponding to the certain target object according to the detection result reported by the second terminal. It is beneficial to save the storage space of the second terminal.
  • controllable information includes: menu data, link information, and a control command; and the operating function correspondingly includes: generating and displaying a corresponding interaction menu according to the menu data, opening the Linking information, executing the control command.
  • an interactive menu containing "purchase, price, consultation", or directly linked to the "purchase” page may also be an enlargement of the image of the garment, etc., to facilitate further operation by the user.
  • the present invention also provides a data processing system, including a first terminal and a second terminal, the first terminal includes: an image acquisition unit, configured to perform image acquisition on at least one photographic subject entity; and an encoding unit, configured to collect The obtained image and the identification information corresponding to the at least one of the photographic subject entities are encoded to form video data; the video data transmitting unit is configured to send the video data formed by the coding unit to the second terminal through a network
  • the second terminal includes: a video data receiving unit, configured to receive the video data; a data separating unit, configured to perform data separation on the video data, obtain a video file, and capture at least one of the video files And an identification unit, configured to identify at least one photographic subject in the video file according to the identification information; and an operation area generating unit, configured to form the at least one photographic object in the video file according to the identifier Corresponding to an operation area of at least one of the photographic subjects a video playback unit, configured to play the video file; an operation action detecting unit, configured to: when the video
  • the video file here may be a video that is transmitted by the camera in real time after being captured by wire or wirelessly, or may be a non-real time video taken at any other time.
  • the corresponding operation area generated by the recognition of the photographic subject in the video may be an area corresponding to the display edge of the photographic subject, or a rectangular area in which the photographic subject is included, etc., specifically, the operation area may be Transparent, can also be displayed under certain conditions (such as setting a video playback mode that can display the operating area and entering this mode).
  • the corresponding operating area should also correspond. Change the ground so that the user can directly operate the subject without paying special attention to the operation. The location of the area.
  • the video data may be acquired by the first terminal and transmitted to the second terminal, and may be acquired by the first terminal in real time and transmitted to the second terminal through the network.
  • the first terminal When the first terminal is in the process of capturing, acquiring the identification information of the captured subject, the first terminal encodes the captured video file into video data, thereby eliminating the need for the first terminal to analyze and acquire the captured object. , the requirement for the first terminal is reduced, and the second terminal is also convenient for recognizing the object in the video.
  • the first terminal further includes: an information receiving unit, configured to receive, by the at least one of the photographic subject entities, identification information corresponding to itself, for encoding to the video In the data.
  • the identification information may be obtained by the first terminal from the photographic subject entity, thereby facilitating establishing a physical association between the identification information and the specific photographic subject entity, facilitating execution of the photographic subject entity and corresponding Management of identification information.
  • the second terminal further includes: a pre-storage unit, configured to pre-store the identification feature; wherein the identification unit compares the content in the image frame of the video file with the pre- The identification features pre-stored by the storage unit are matched to identify at least one of the video files.
  • the identification feature of one or more objects is pre-stored in the cloud storage space corresponding to the second terminal or the second terminal, so that the video is played at any time after the second terminal acquires the video file.
  • the content in the image frame of the video is matched with the pre-stored identification feature to identify the subject in the video. Since the pre-stored identification feature is adopted, there is no special requirement for the video file itself, and all the video files can be applied to the technical solution, and the second terminal can be downloaded from the network and from other terminals.
  • the acquired or the second terminal itself has a strong versatility.
  • the second terminal may include parameters such as “there is an opening in the front, and there are symmetrical sleeves on the left and right", so that the second terminal can "know” to the “clothing", and the characteristics of the clothing itself, such as color, The size, style, etc., can realize the intelligent recognition of "clothing” by the second terminal.
  • the second terminal itself pre-stores the identification feature, and the identification information sent by the first terminal is not contradictory to the first terminal.
  • the object recognition may be performed by using only one of the objects, or both.
  • the data separation unit includes: a frame extraction subunit, configured to extract an identification frame from the video data, and obtain the video file remaining after extracting the identification frame; And a frame parsing subunit, configured to further extract the identification information from the identification frame, for the recognizing operation of the video file by the identifying unit.
  • an identification frame including identification information may be added in the middle or both ends of the data stream corresponding to the video file.
  • the header portion of the identification frame should include a type identifier for the second terminal to identify the type of the identification frame in the video data.
  • the data frame is determined.
  • the identification frame header is mainly composed of special characters to identify the identification frame.
  • the second terminal continues to parse other information such as the length of the identified frame to completely determine the corresponding identified frame.
  • the identification frame should also include an information portion containing identification information of the subject, etc., for identifying the subject in the video.
  • the identification information can be conveniently encoded in the video data, and the identification frame is conveniently parsed from the video data, and the identification information of the object is extracted from the information part of the identification frame, and the video is identified by the identification information.
  • the subject in the file is identified.
  • the method further includes: at least one of the first terminals as an upper node, and all of the photographic subject entities as lower nodes to form an Ad Hoc hierarchical network structure.
  • the Ad Hoc layered network structure does not need to rely on the existing fixed communication network infrastructure, and can quickly deploy the used network system.
  • Each network node in the network cooperates with each other to communicate and exchange information through wireless links, realizing information and services.
  • Network nodes are able to enter and leave the network dynamically, randomly, and frequently, often without prior warning or notification, and without disrupting communication between other nodes in the network.
  • the first terminal may be a camera, the camera is used as an upper node of Ad Hoc, and a shooting object (such as clothes) is used as a lower node.
  • an upper node ie, a camera
  • the first terminal further receives controllable information corresponding to the information sent by the at least one of the photographic subject entities; wherein the coding unit is further configured to: The information is encoded to the video data in association with the identification information, and the data separation unit is further configured to acquire, from the video data, controllable information associated with at least one of the photographic objects, the processing unit further And configured to perform an operation function on the specified photographic subject according to the controllable information in a case where the operation action on the specified operation area is detected; or the second terminal is still detecting the When the operation action of the operation area is specified, the detection result is reported to the first terminal, and the first terminal sends the controllable information corresponding to the specified operation area to the second terminal correspondingly The processing unit performs an operation function on the specified photographic subject according to the controllable information.
  • the second terminal can perform default processing operations on all the shooting objects, such as zooming in on all the clicked objects, or storing the clicked objects, or directly calling the browser. "Search for pictures" on the subject that was clicked.
  • the controllable information can be associated with the identification information and encoded into the video data, and when the user operates the recognized photographic subject, the second terminal according to the controllable information Perform the appropriate function.
  • the controllable information may be encoded into the above identified frame, or separately encoded as a control information frame, and the identification frame (and possibly the control information frame) may be integrated with the captured video file to form video data.
  • the second terminal performs the corresponding function according to the parsed controllable information.
  • the second terminal may be saved in the matching database together with the associated identification information, and when the user operates the identified photographic subject, the identification information of the specified object is retrieved from the matching database.
  • Associated controllable information to perform operational functions on the subject.
  • the first terminal encodes controllable information into the video data
  • the identification information and the controllable information associated with the photographic subject are sent to the second terminal together; but in order to save network resources and improve the transmission speed of the video data, the first terminal may only report the detection result reported by the second terminal, When there is an operation action in the operation area corresponding to a certain photographic subject, the corresponding controllable information is sent to the second terminal, which is also beneficial for saving the storage space of the second terminal.
  • controllable information separated by the data separation unit includes: menu data, link information, and a control command; and the operation function performed by the processing unit correspondingly includes:
  • the menu data generates and displays a corresponding interactive menu, opens the link information, and executes the control command.
  • the mobile phone when the user watches the shopping video through the mobile phone, the mobile phone recognizes a certain piece of clothing in the video, and the user touches the screen to click the operation area of the clothes, and the pop-up includes, for example, "purchase, price, Consult the "Interactive Menu", or directly link to the "Purchasing" page, or you can enlarge the image of the clothes, etc., to facilitate further operation by the user.
  • the photographic subject in the video can be identified, so that the user can directly operate the photographic subject in the video while watching the video, without performing a separate network search or the like, thereby facilitating the user Operation, improving the user experience.
  • FIG. 1 shows a flow chart of a data processing method according to an embodiment of the present invention
  • FIG. 2 shows a block diagram of a data processing system in accordance with an embodiment of the present invention
  • FIG. 3 illustrates an intelligent video interaction system based on an Ad Hoc network structure in accordance with an embodiment of the present invention
  • FIGS. 5A to 5C are diagrams showing an intelligent video interaction system according to an embodiment of the present invention. detailed description
  • FIG. 1 shows a flow chart of a data processing method in accordance with an embodiment of the present invention.
  • the data processing method includes: Step 102: The first terminal performs image collection on at least one photographic subject entity, and the collected image and the at least one of the photographic objects The identification information of the entity is encoded, and the video data is formed and sent to the second terminal through the network.
  • At least one operation area of the photographic subject Step 108, the second terminal, when playing the video file, executing a specified photographic object corresponding to the specified operation area according to the detected operation action on the specified operation area Associated operational features.
  • the video file here can be a video that is transmitted by the camera in real time after being captured by wire or wirelessly, or a non-real time video taken at any other time.
  • the corresponding operation area generated by the recognition of the photographic subject in the video may be an area corresponding to the display edge of the photographic subject, or a rectangular area in which the photographic subject is included, etc., specifically, the operation area may be Transparent, can also be displayed under certain conditions (such as setting a video playback mode that can display the operating area and entering this mode).
  • the corresponding operating area should also correspond.
  • the ground changes so that the user can directly operate the subject without paying special attention to the position of the operating area.
  • the video data may be acquired by the first terminal and transmitted to the second terminal, in particular It can be acquired by the first terminal in real time and transmitted to the second terminal through the network.
  • the first terminal When the first terminal is in the process of capturing, acquiring the identification information of the captured subject, the first terminal encodes the captured video file into video data, thereby eliminating the need for the first terminal to analyze and acquire the captured object. , the requirement for the first terminal is reduced, and the second terminal is also convenient for recognizing the object in the video.
  • the corresponding operation can be implemented, such as linking to a webpage (calling the browser and switching to the browser interface, or displaying the video playing interface in the form of a bubble box), the webpage is this one. Brand information and/or purchase information of the clothes; for example, pop-ups on the video playback interface include "purchase, price, consultation"
  • a menu (for example, it can also contain other information), the user can perform further control operations by selecting the menu.
  • the photographic subject entity corresponds to the storage device and the information transceiving device, wherein the storage device stores the identification information of the photographic subject entity, which is pre-stored in the storage device, and the information transceiving device is configured to use the identification information.
  • the storage device stores the identification information of the photographic subject entity, which is pre-stored in the storage device, and the information transceiving device is configured to use the identification information.
  • Send to the first terminal When the first terminal acquires the identification information of the object to be captured, the information can be sent to the first terminal by transmitting the identification information by transmitting the identification information.
  • the storage device and the information transceiving device may be located in the photographic subject entity, for example, the photographic subject entity is a smart phone; the storage device and the information transceiving device may also be associated with the photographic subject entity, such as connected to the photographic subject entity, or It is placed in the vicinity of the subject entity, or because the storage device includes identification information of a certain subject entity, and the information transmitting and receiving apparatus is configured to transmit the identification information, the subject entity and the storage device and the information transmitting and receiving device are considered to be Associated with.
  • one storage device may correspond to one or more photographic subject entities
  • one information transceiving device may also correspond to one or more storage devices.
  • the information transceiver device may send all the identification information in the associated storage device to the first terminal; or by setting another image acquisition device, which passes the monitoring The real-time status of a terminal, determining the subject entity it captured, and thus the information
  • the transceiver device transmits only the identification information of the photographic subject entity that can be captured to the first terminal, thereby reducing the amount of data that the first terminal needs to process.
  • the method further includes: the first terminal receiving, by the at least one of the photographic subject entities, identification information corresponding to itself for encoding into the video data.
  • the identification information may be obtained by the first terminal from the photographic subject entity, thereby facilitating establishing a physical association between the identification information and the specific photographic subject entity, facilitating execution of the photographic subject entity and corresponding Management of identification information.
  • the method further includes: the second terminal matching the content in the image frame of the video file with the pre-stored identification feature to identify at least one of the video files .
  • the identification feature of one or more objects is pre-stored in the cloud storage space corresponding to the second terminal or the second terminal, so that the video is played at any time after the second terminal acquires the video file.
  • the content in the image frame of the video is matched with the pre-stored identification feature to identify the subject in the video. Since the pre-stored identification feature is adopted, there is no special requirement for the video file itself, and all the video files can be applied to the technical solution, and the second terminal can be downloaded from the network and from other terminals.
  • the acquired or the second terminal itself has a strong versatility.
  • the pixel information in the image frame and one or more subsequent image frames may be The comparison is performed to determine whether a change of the subject occurs. If there is a change, the recognition may be performed. Otherwise, it is not necessary to identify again, which is advantageous for improving the recognition efficiency and reducing the requirement for the terminal processing capability.
  • the pre-stored identification feature in the case of the cartridge, may be an image of the object, and may be compared with the image in the video file to identify the object; further, the identification feature may also be some characteristic parameter. For example, for "clothing”, it may include parameters such as "there is an opening in the front, and there are symmetrical sleeves on the left and right", so that the second terminal can "know” to the "clothing", and the characteristics of the clothing itself, such as color, The size, style, etc., can realize the intelligent recognition of "clothing" by the second terminal. At the same time, the second terminal itself pre-stores
  • the identification feature is not contradictory to the identification information sent by the first terminal, and the object recognition may be performed by using only one of them, or both may be used for identification.
  • the process of performing data separation on the video data by the second terminal includes: parsing the video data, extracting an identification frame from the video data, and obtaining the extracted identification frame And remaining the video file; further extracting the identification information from the identification frame for an identification operation of the video file.
  • an identification frame including identification information may be added in the middle or both ends of the data stream corresponding to the video file.
  • the header portion of the identification frame should include a type identifier for the second terminal to identify the type of the identification frame in the video data.
  • the data frame is determined.
  • the identification frame header is mainly composed of special characters to identify the identification frame.
  • the second terminal continues to parse other information such as the length of the identified frame to completely determine the corresponding identified frame.
  • the identification frame should also include an information portion containing identification information of the subject, etc., for identifying the subject in the video.
  • the identification information can be conveniently encoded in the video data, and the identification frame is conveniently parsed from the video data, and the identification information of the object is extracted from the information part of the identification frame, and the video is identified by the identification information.
  • the subject in the file is identified.
  • the method further includes: at least one of the first terminals as an upper node, and all of the photographic subject entities as lower nodes to form an Ad Hoc hierarchical network structure.
  • the Ad Hoc layered network structure does not need to rely on the existing fixed communication network infrastructure, and can quickly deploy the used network system.
  • Each network node in the network cooperates with each other to communicate and exchange information through a wireless link, thereby realizing the sharing of information and services.
  • Network nodes are able to enter and leave the network dynamically, randomly, and frequently, often without prior warning or notification, and without disrupting communication between other nodes in the network.
  • the first terminal may be a camera, the camera is used as an upper node of Ad Hoc, and a shooting object (such as clothes) is used as a lower node.
  • an upper node ie, a camera
  • the method further includes: the first terminal further receiving, by the at least one of the photographic subject entities, controllable information corresponding to itself; wherein, the first terminal Control information is encoded to the video data in association with the identification information, and the second terminal further acquires controllable information associated with at least one of the photographic objects from the video data, and when detected Determining an operation function of the specified photographic subject according to the controllable information when the operation action of the designated operation area is performed; or when the second terminal detects an operation action on the specified operation area, and detecting When the result is reported to the first terminal, the first terminal sends controllable information corresponding to the specified operation area to the second terminal, so that the second terminal performs the pair according to the controllable information.
  • the operation function of the specified subject is specified.
  • the second terminal can perform default processing operations on all the shooting objects, such as zooming in on all the clicked objects, or storing the clicked objects, or directly calling the browser. "Search for pictures" on the subject that was clicked.
  • the controllable information can be associated with the identification information and encoded into the video data, and when the user operates the recognized photographic subject, the second terminal according to the controllable information Perform the appropriate function.
  • the controllable information may be encoded into the above identified frame, or separately encoded as a control information frame, and the identification frame (and possibly the control information frame) may be integrated with the captured video file to form video data.
  • the second terminal performs the corresponding function according to the parsed controllable information.
  • the second terminal may be saved in the matching database together with the associated identification information, and when the user operates the identified photographic subject, the identification information of the specified object is retrieved from the matching database.
  • Associated controllable information to perform operational functions on the subject.
  • the first terminal when the first terminal encodes the controllable information into the video data, the identification information associated with the photographic subject in the video data and the controllable information are often sent to the second terminal together; however, in order to save network resources and improve The transmission speed of the video data, the first terminal may send the corresponding controllable information to the second terminal only when there is an operation action in the operation area corresponding to the certain target object according to the detection result reported by the second terminal. It is beneficial to save the storage space of the second terminal.
  • controllable information includes: menu data, link information, and a control command; and the operation function correspondingly includes: generating and synchronizing according to the menu data Displaying a corresponding interactive menu, opening the link information, and executing the control command.
  • the mobile phone when the user watches the shopping video through the mobile phone, the mobile phone recognizes a certain piece of clothing in the video, and the user touches the screen to click the operation area of the clothes, and the pop-up includes, for example, "purchase, price, Consult the "Interactive Menu", or directly link to the "Purchasing" page, or you can enlarge the image of the clothes, etc., to facilitate further operation by the user.
  • FIG. 2 shows a block diagram of a data processing system in accordance with an embodiment of the present invention.
  • the video data is sent to the second terminal 204 through the network;
  • the second terminal 204 includes: a video data receiving unit 204A, configured to receive the video data; and a data separating unit 204B, configured to use the video data Performing data separation to obtain a video file and identification information associated with at least one of the video files;
  • the identifying unit 204C configured to identify at least one photographic object in the video file according to the identification information;
  • the unit 204D according to the identified at least one photographic subject in the video text
  • An operation area corresponding to at least one of the photographic objects is formed in the device;
  • a video playback unit 204E is configured to play the video file;
  • an operation action detecting unit 204F is configured to: when the video playback unit 204E plays the video file, Detecting an operation action on the specified operation area;
  • the processing unit 204G configured to perform, when the operation action detection unit 204F detects an operation action on the specified operation area, perform execution of a specified photographic object corresponding to the specified
  • the video file here may be a video that is transmitted by the camera in real time after being captured by wire or wirelessly, or may be a non-real time video taken at any other time.
  • the corresponding operation area generated by the recognition of the photographic subject in the video may be an area corresponding to the display edge of the photographic subject, or a rectangular area in which the photographic subject is included, etc., specifically, the operation area may be Transparent It can also be displayed under certain conditions (such as setting a video playback mode that can display the operation area and entering this mode).
  • the corresponding operating area should also correspond.
  • the ground changes so that the user can directly operate the subject without paying special attention to the position of the operating area.
  • the video data may be acquired by the first terminal 202 and transmitted to the second terminal 204, and may be acquired by the first terminal 202 in real time and transmitted to the second terminal 204 through the network.
  • the first terminal 202 acquires the identification information of the captured subject in the process of performing the photographing
  • the first terminal 202 encodes the captured video file into video data, thereby eliminating the need for the first terminal 202 to analyze the photographed object.
  • the feature acquisition reduces the requirement for the first terminal 202, and also facilitates the second terminal 204 to recognize the photographic subject in the video.
  • the corresponding operation can be implemented, such as linking to a webpage (calling the browser and switching to the browser interface, or displaying the video playing interface in the form of a bubble box), the webpage is this one.
  • Brand information and/or purchase information of the clothes for example, a menu containing "purchase, price, consultation" (for example, other information may also be included) is displayed on the video playing interface, and the user can select the menu by Implement further control operations.
  • the photographic subject entity corresponds to a storage device and an information transceiving device (not shown), wherein the storage device stores identification information of the photographic subject entity, which is pre-stored in the storage device, and the information transceiving device Then used to send the identification information to the first terminal
  • the information receiving and receiving device that receives the instruction may send the corresponding identification information to the first terminal 202 by transmitting the identification information acquisition command.
  • the storage device and the information transceiving device may be located in the photographic subject entity, for example, the photographic subject entity is a smart phone; the storage device and the information transceiving device may also be associated with the photographic subject entity, such as connected to the photographic subject entity, or Is placed near the subject entity, or because the storage device contains a certain subject entity
  • the identification information is used, and the information transmitting and receiving apparatus is configured to transmit the identification information, and the object to be photographed is considered to be associated with the storage device and the information transmitting and receiving device.
  • one storage device may correspond to one or more photographic subject entities
  • one information transceiving device may also correspond to one or more storage devices.
  • the information transceiver device may send all the identification information in the associated storage device to the first terminal 202; or by setting another image acquisition device, Monitoring the real-time status of the first terminal 202, determining the photographic subject entity that is captured, so that the information transceiving device transmits only the identification information of the photographic subject entity that can be captured to the first terminal 202, thereby reducing the first terminal. 202 The amount of data that needs to be processed.
  • the first terminal 202 further includes: an information receiving unit 202D, configured to receive, by using at least one of the photographic subject entities, identification information corresponding to itself, for encoding to the In the video data.
  • an information receiving unit 202D configured to receive, by using at least one of the photographic subject entities, identification information corresponding to itself, for encoding to the In the video data.
  • the identification information may be obtained by the first terminal 202 from the photographic subject entity, thereby facilitating establishing a physical association between the identification information and a specific photographic subject entity, facilitating execution of the photographic subject entity and The management of the corresponding identification information.
  • the second terminal 204 further includes: a pre-storage unit (not shown) for pre-storing the identification feature; wherein the identification unit 204C is to The content in the image frame is matched with the identification feature pre-stored by the pre-storage unit to identify at least one of the video files.
  • the identification feature of one or more objects is pre-stored in the cloud storage space corresponding to the second terminal 204 or the second terminal 204, so that any time after the second terminal 204 acquires the video file, or
  • the content in the image frame of the video is matched with the pre-stored identification feature to identify the subject in the video. Since the pre-stored identification feature is adopted, there is no special requirement for the video file itself, and all the video files can be applied to the technical solution, and the second terminal 204 can be downloaded from the network and from other terminals.
  • the information obtained at the location or the second terminal 204 is self-photographed, and has strong versatility.
  • the pixel information in the image frame and one or more subsequent image frames may be Ratio In comparison, it is determined whether a change of the subject occurs, and if there is a change, the identification may be performed. Otherwise, it is not necessary to identify again, which is advantageous for improving the recognition efficiency and reducing the requirement for the processing capability of the terminal.
  • the pre-stored identification feature in the case of the cartridge, may be an image of the object, and may be compared with the image in the video file to identify the object; further, the identification feature may also be some characteristic parameter. For example, for "clothing”, parameters such as “there is an opening in the front and symmetric sleeves on the left and right” may be included, so that the second terminal 204 can "recognize” the "clothing", and the characteristics of the clothing itself, such as the color, which needs to be recognized. The second terminal 204 can realize the intelligent recognition of the "clothing" by the size, the style, and the like. At the same time, the second terminal 204 pre-stores the identification feature, and the identification information sent by the first terminal 202 does not contradict the identification information transmitted by the first terminal 202. The object recognition may be performed by using only one of them, or both.
  • the data separating unit 204B includes: a frame extracting subunit (not shown) for extracting an identification frame from the video data, and obtaining the extracted identification frame. The remaining video file; a frame parsing subunit (not shown) for further extracting the identification information from the identification frame for the identification unit to identify the video file operating.
  • an identification frame including identification information may be added in the middle or both ends of the data stream corresponding to the video file.
  • the header portion of the identification frame should include a type identifier for the second terminal 204 to identify the type of the identification frame in the video data.
  • the data is determined.
  • the frame is an identification frame.
  • the identification frame header is mainly composed of special characters to identify the identification frame.
  • the second terminal 204 continues to parse other information such as the length of the identified frame to completely determine the corresponding identified frame.
  • the identification frame should also include an information portion containing identification information of the subject, etc., for identifying the subject in the video.
  • the identification information can be conveniently encoded in the video data, and the identification frame is conveniently parsed from the video data, and the identification information of the object is extracted from the information part of the identification frame, and the video is identified by the identification information.
  • the subject in the file is identified.
  • the method further includes: at least one of the first terminals 202 as an upper node, and all of the photographic subject entities as lower nodes to form an Ad Hoc sub- Layered network structure.
  • the Ad Hoc hierarchical network structure does not need to rely on the existing fixed communication network infrastructure, and can quickly deploy the network system used.
  • Each network node in the network cooperates with each other to communicate and exchange information through wireless links, thereby realizing the sharing of information and services.
  • Network nodes can enter and leave the network dynamically, randomly, and frequently, often without prior warning or notification, and without disrupting communication between other nodes in the network.
  • the first terminal 202 may be a camera.
  • the camera is used as an upper node of the Ad Hoc, and the object (such as clothes) is used as a lower node.
  • an upper node ie, a camera
  • the first terminal 202 further receives controllable information corresponding to itself sent by the at least one of the photographic subject entities; wherein the encoding unit 202B is further configured to: The controllable information is encoded to the video data in association with the identification information, and the data separation unit 204B is further configured to acquire, from the video data, controllable information associated with at least one of the photographic objects,
  • the processing unit 204G is further configured to: when the operation action on the specified operation area is detected, perform an operation function on the specified photographic subject according to the controllable information; or the second terminal 204 is still When the operation action on the specified operation area is detected, the detection result is reported to the first terminal 202, and the first terminal 202 correspondingly sends controllable information corresponding to the specified operation area to the
  • the second terminal 204 is configured to perform an operation function on the designated photographic subject according to the controllable information by the processing unit 204G.
  • the second terminal 204 can perform default processing operations on all the shooting objects, such as zooming in on all the clicked objects, or storing the clicked objects, or directly calling the browsing.
  • the device performs a "search for a picture" on the object that is clicked.
  • the controllable information can be associated with the identification information and encoded into the video data, and when the user operates the recognized photographic subject, the second terminal 204 can be controlled according to the control.
  • the information performs the corresponding function.
  • the code may be encoded into the above identified frame, or separately encoded as a control information frame, and the frame will be identified (may also be Including control information frames) Integration with the captured video files to form video data.
  • the second terminal 204 performs the corresponding function according to the parsed controllable information. After the second terminal 204 parses the controllable information, it can be saved in the matching database together with the associated identification information, and when the user operates the recognized photographic subject, the identification of the specified object is retrieved from the matching database. Controllable information associated with the information to perform an operational function on the subject.
  • the first terminal 202 encodes the controllable information into the video data
  • the identification information and the controllable information associated with the photographic subject in the video data are often sent to the second terminal 204; however, in order to save network resources
  • the first terminal 202 can send the corresponding controllable information to the second only when there is an operation action in the operation area corresponding to the certain photographic subject according to the detection result reported by the second terminal 204.
  • the terminal 204 also facilitates saving the storage space of the second terminal 204.
  • the mobile phone when the user watches the shopping video through the mobile phone, the mobile phone recognizes a certain piece of clothing in the video, and the user touches the screen to click the operation area of the clothes, and the pop-up includes, for example, "purchase, price, Consult the "Interactive Menu", or directly link to the "Purchasing" page, or you can enlarge the image of the clothes, etc., to facilitate further operation by the user.
  • FIG. 3 is a block diagram showing an intelligent video interaction system based on an Ad Hoc network structure in accordance with an embodiment of the present invention.
  • an intelligent video interaction system based on an Ad Hoc network structure includes a client 302 and a server 304.
  • the server 304 adopts an Ad Hoc hierarchical network structure to collect information to form video data for the client 302 to download, and the client 302 can play in real time as needed or at any other time.
  • Each network node in the Ad Hoc network cooperates with each other to communicate and exchange information over the wireless link to share information and services.
  • Network nodes can enter and leave the network dynamically, randomly, and frequently, often without prior Alerts or notifies, and does not disrupt communication between other nodes in the network, with a strong flexibility.
  • the use of the Ad Hoc network structure is only a preferred method. If other network structures are used to implement the information collection process in the present invention, it should also be included in the scope of the present invention.
  • Server 304 includes:
  • the server 304A is configured to provide the client 302 to download video data, where the video data may be video data including an identification frame or a video file without an identification frame.
  • the server 304A can transmit any of the above two kinds of video data according to different choices of the client.
  • the upper node 304B and the upper node 304C are upper nodes in the Ad Hoc network (obviously, the number of upper nodes may be changed as needed, that is, only one upper node may be included, or two or more upper nodes may be included.
  • the description includes two nodes as an example.
  • the nodes do not affect each other, and can enter and leave the network dynamically, randomly, and frequently, so that the information collection system has strong flexibility.
  • the upper node may here be a camera for dynamically acquiring image information of the subject (i.e., the lower node) according to the request of the server 304A.
  • the instruction may be sent by sending the identification information and/or the controllable information, and the lower node receives the instruction and the corresponding identification information and/or controllable.
  • the information is sent to the upper node.
  • One of the upper nodes may correspond to a plurality of lower nodes.
  • the upper layer node 304B corresponds to the lower layer nodes 304D and 304E, and the lower layer nodes 304D and 304E also do not affect each other.
  • the lower nodes 304D, 304E, 304F, and 304G are lower nodes in the Ad Hoc network. Like the upper nodes, they can enter and leave the network dynamically, randomly, and frequently, without affecting the work of other network nodes.
  • the lower node receives the command for acquiring the identification information and/or the controllable information sent by the upper node, the identification information and the controllable information are transmitted to the upper node.
  • Client 302 includes:
  • the receiving module 302A is configured to receive video data acquired from the server, where the video data includes identification information identifying the photographic subject.
  • the video data includes an identification frame
  • the identification frame includes features such as an identification frame header, an identification frame length, and identification frame information.
  • the identification frame header is mainly composed of special characters for identifying the identification frame; the recognition frame length is used to mark the length of the identification frame information; the recognition frame information portion is composed of a special character encoding format, and includes the identification information of the shooting object and Controllable information, etc. Therefore, the identification frame can be separated from the video data, and the identification frame can be parsed, the identification information and the controllable information of the photographic subject are extracted from the information part of the identification frame, and the photographic subject in the video file is identified by the identification information.
  • the video decoding module 302C is configured to decode the video file.
  • the audio and video output module 302D is configured to play the decoded audio and video output.
  • the pair database 302E is used to store the identification information and the controllable information separated from the video data.
  • the smart identification module 302F is configured to identify a shooting object in the video file according to the separated identification information, and generate a corresponding operation area according to the recognized shooting object.
  • the intelligent interactive display module 302G is configured to perform a corresponding operation according to the separated controllable information when the video object is operated in the operation area of the recognized photographing object when the video file is played.
  • FIG. 4 shows a flow chart of an intelligent video interaction system in accordance with an embodiment of the present invention.
  • Step 402 The user selects a corresponding video file for playing, that is, selects video data containing data information, or is a simple video file.
  • Step 404 when the user wants to know the specific information of an object (the subject), he can click on an object.
  • the user first operates on a specified object in the video (i.e., clicks, of course, through other operations, such as a touch screen), and then determines whether the specified object is an identifiable subject.
  • Step 406 Determine which video mode the user selects to play. If the special mode is selected, go to step 408. Otherwise, go to step 402.
  • the user can select the view.
  • the frequency mode wherein the special mode is the mode that can identify the photographic subject and supports the user to operate the recognized photographic subject during the video playing process. If the user selects the special mode, the video data can be separated for the video data containing the data information, and the identification information and the controllable information of the object can be obtained to identify and operate the object; if the played video is not
  • the video file containing the data information can be identified by the recognition feature stored locally or in the cloud by the terminal. If the user selects not the special mode, only the video playback can be performed, and the subject in the video cannot be operated.
  • Step 408 according to the selected content, pop-up interactive menu for dynamic interaction.
  • the pop-up interactive menu is the corresponding action based on the controllable information.
  • the identification information separated from the video data and the controllable information associated with the identification information are saved to the matching database.
  • the subject 502 is recognized according to the identification information (or the identification feature stored to the local or the cloud), and the recognized subject 502 can be specially displayed (such as displaying a highlight range, etc.), in the subject 502.
  • the vicinity of the operation area (not shown) corresponding to the subject 502 is generated.
  • the user can click on the operation area of the photographic subject to operate the photographic object 502.
  • the terminal retrieves the controllable information in the matching data according to the operation of the photographic object 502, and performs a corresponding operation.
  • the interactive menu 504 is popped up.
  • the subject 502 can be further manipulated through the interactive menu 504.
  • FIG. 5B after the subject 502 is clicked, a bubble frame 506 is popped up, and the information of the subject 502 can be known from the bubble frame 506. It is also possible to zoom in on the subject 502 after clicking on the subject 502, or call the browser and directly switch to the page of the corresponding web link (as shown in Fig. 5C).
  • step 410 the user selects a menu, as shown in Figure 5A.
  • Step 412 Send the operation information selected by the user to the designated server, and according to the recognized operation function, by sending the selected operation information to the server, the response to the operation information may be made according to the stored operation function.
  • Step 414 the server returns the operation result, for example, may pop up as shown in FIG. 5B.
  • a bubble frame that captures the details of the object 502.
  • the present invention proposes a new data processing scheme, which can identify a photographic subject in a video, and enable the user to operate the photographic subject in the video while watching the video, without performing a separate network search or the like. , which is beneficial to the user operation and enhances the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明提供了一种数据处理方法,包括:第一终端对至少一个拍摄对象实体进行图像采集,并将图像和对应的识别信息进行编码,形成视频数据,发送至第二终端;第二终端对视频数据进行数据分离,得到视频文件和与视频文件中的至少一个拍摄对象相关联的识别信息;第二终端根据识别信息识别出视频文件中的至少一个拍摄对象,并在视频文件中形成对应的操作区域;第二终端在播放视频文件时,根据检测到的对指定操作区域的操作动作,执行相关联的操作功能。本发明还提出了一种数据处理系统。通过本发明的技术方案,可以对视频中的拍摄对象进行识别,使用户在观看视频时能够直接对视频中的拍摄对象进行操作,从而有利于简化用户操作,提升了用户体验。

Description

说 明 书 数据处理方法和数据处理系统 技术领域
本发明涉及数据处理技术领域, 具体而言, 涉及一种数据处理方法和 一种数据处理系统。 背景技术
目前, 当用户进行网上购物时, 通过浏览网页图片的方式去购买产 品, 但是购买到的实物跟网上的照片偏差较大, 产生很多纠纷。
用户通过图片无法全面地了解一件物体, 但如果通过视频的形式去描 述同一件物体, 尤其是通过对线下卖场中的实物进行视频采集, 无疑会增 强用户对于同一件物体的感知, 有利于而且允许用户在观看视频的同时对 感兴趣的物体进行操作, 将极大提高提升用户的购买体验。
然而在现有技术中, 人们在观看视频的过程中, 比如查看到了感兴趣 的物品, 只能够另外通过在网络上进行搜索等方式才能够实现购买等操 作。 比如用户需要单独打开浏览器, 通过输入该物品的名称进行搜索, 或 是通过对视频画面进行截图后的以图搜图, 从而进入网上电商的网站后进 行购买。 而通过输入物品名称等方式, 当用户无法得知其准确名称时, 甚 至很难搜索到相关的物品, 更难以实现购买等操作。
因此, 需要一种新的技术方案, 可以对视频中的拍摄对象进行识别, 使用户在观看视频时能够直接对视频中的拍摄对象进行操作, 而无需通过 单独的网络搜索等方式进行操作, 从而有利于筒化用户操作, 提升了用户 体验。 发明内容
本发明正是基于上述问题, 提出了一种新的数据处理方案, 可以对视 频中的拍摄对象进行识别, 使用户在观看视频时能够直接对视频中的拍摄 对象进行操作, 而无需通过单独的网络搜索等方式进行操作, 从而有利于 筒化用户操作, 提升了用户体验。
有鉴于此, 本发明提出了一种数据处理方法, 包括: 第一终端对至少 一个拍摄对象实体进行图像采集, 并将采集到的图像和对应于至少一个所 述拍摄对象实体的识别信息进行编码, 形成视频数据, 并通过网络发送至 第二终端; 所述第二终端接收所述视频数据, 对所述视频数据进行数据分 离, 得到视频文件和与所述视频文件中的至少一个拍摄对象相关联的识别 信息; 所述第二终端根据所述识别信息识别出所述视频文件中的至少一个 拍摄对象, 并在所述视频文件中形成对应于至少一个所述拍摄对象的操作 区域; 所述第二终端在播放所述视频文件时, 根据检测到的对指定操作区 域的操作动作, 执行与所述指定操作区域对应的指定拍摄对象相关联的操 作功能。
在该技术方案中, 通过对视频中拍摄对象的识别, 可以使用户在观看 视频时直接对识别出的物体进行操作, 提升了用户的体验。 通过对视频中 的拍摄对象的识别, 生成的对应的操作区域可以是该拍摄对象的显示边沿 对应的区域, 或是将该拍摄对象包含在其中的矩形区域等, 具体地, 该操 作区域可以是透明的, 也可以在一定条件下 (比如设置一个可显示出操作 区域的视频播放模式, 并进入该模式时) 进行显示。 由于视频是动态的, 因此, 当视频内的拍摄对象发生移动 (主动地发生移动, 或由于镜头的移 动而使得该拍摄对象在终端屏幕上形成相对位置变化) 时, 对应的操作区 域也应当相应地变化, 从而使得用户直接对拍摄对象进行操作即可, 而无 需特别关注该操作区域的位置。
优选地, 视频数据可以是由第一终端获取后传输至第二终端的, 尤其 是可以由第一终端实时获取并通过网络传输至第二终端。 当第一终端在进 行拍摄的过程中, 获取被拍摄的拍摄对象的识别信息, 由第一终端将其与 拍摄的视频文件编码成视频数据, 从而无需第一终端对拍摄对象进行分析 和特征获取, 降低了对第一终端的要求, 也方便了第二终端对视频中的拍 摄对象进行识别。
在上述技术方案中, 优选地, 还包括: 所述第一终端接收至少一个所 述拍摄对象实体发送的对应于其自身的识别信息, 以用于编码至所述视频 数据中。
在该技术方案中, 识别信息可以是第一终端从拍摄对象实体处获取 的, 则有助于在识别信息与具体的拍摄对象实体之间建立实际上的关联, 便于执行对拍摄对象实体和相应的识别信息的管理工作。
在上述技术方案中, 优选地, 还包括: 所述第二终端将所述视频文件 的图像帧中的内容与预存储的识别特征进行匹配, 以识别出所述视频文件 中的至少一个拍摄对象。
在该技术方案中, 在第二终端中或第二终端对应的云端存储空间内, 预存储一个或多个物体的识别特征, 从而在第二终端获取视频文件之后的 任意时刻、 或是播放视频文件 (预先获取或实时接收的) 的过程中, 将视 频的图像帧中的内容与预存储的识别特征进行匹配, 以识别出视频中的拍 摄对象。 由于采用了预存储的识别特征, 因而对于视频文件本身而言, 并 没有特殊的要求, 所有的视频文件都可以适用于该技术方案, 可以是第二 终端从网络上下载的、 从其他终端处获取的或是第二终端自己拍摄的, 具 有较强的通用性。 同时, 由于视频文件中的拍摄对象并不总是在变化, 因 此, 在对某个图像帧中的拍摄对象进行识别之后, 可以将该图像帧与其之 后的一个或多个图像帧中的像素信息进行比较, 以判断出是否发生拍摄对 象的变化, 若存在变化, 则可以进行识别, 否则无需再次识别, 有利于提 高识别效率, 降低对终端处理能力的要求。
其中, 预存储的识别特征, 筒单而言, 可以是物体的图像, 则可以根 据与视频文件中的画面进行比对, 以识别出该物体; 进一步地, 识别特征 还可以是一些特征参数, 比如对于 "衣服" , 可以包括 "前方存在开口, 左右存在对称的袖子" 等参数, 使得第二终端能够 "认识" 到 "衣服" 为 何物, 再加之需要识别的衣服自身的特征, 比如颜色、 大小、 款式等, 就 可以由第二终端实现对 "衣服" 的智能识别。 同时, 第二终端自身预存储 识别特征, 与其根据第一终端发送的识别信息, 两者并不矛盾, 可以仅用 其中的某一个进行对象识别, 也可以同时利用两者进行识别。
在上述技术方案中, 优选地, 所述第二终端对所述视频数据进行数据 分离的过程包括: 解析所述视频数据, 从所述视频数据中提取识别帧, 并 得到经提取所述识别帧后剩余的所述视频文件; 从所述识别帧中进一步提 取出所述识别信息, 以用于对所述视频文件的识别操作。
在该技术方案中, 可以在视频文件对应的数据流中间或两端添加包含 识别信息的识别帧。 为了实现对视频数据的分离, 在识别帧的帧头部分应 该包含类型标识, 用于第二终端对视频数据中的识别帧的类型进行识别, 当识别到上述类型标识后, 即判断该数据帧为识别帧, 具体比如识别帧头 主要是由特殊字符组成, 以用来标识识别帧。 然后, 第二终端继续解析其 他的如识别帧长度等信息, 以完整地确定对应的识别帧。 识别帧还应该包 括信息部分, 该信息部分中包含了拍摄对象的识别信息等, 以用于对视频 中的拍摄对象进行识别。 通过采用识别帧的方式, 能够方便地将识别信息 编码在视频数据中, 并方便地从视频数据中解析出识别帧, 从识别帧的信 息部分提取出拍摄对象的识别信息, 通过识别信息对视频文件中的拍摄对 象进行识别。
在上述技术方案中, 优选地, 还包括: 至少一个所述第一终端作为上 层节点, 所有的所述拍摄对象实体作为下层节点, 以形成 Ad Hoc分层式 网络结构。
在该技术方案中, Ad Hoc 分层式网络结构不需要依靠现有固定通信 网络基础设施, 并且能够迅速展开使用的网络体系。 网络中的各个网络节 点相互协作, 通过无线链路进行通信、 交换信息, 实现信息和服务的共 享。 网络节点能够动态地、 随意地、 频繁地进入和离开网络, 而常常不需 要事先示警或通知, 并且不会破坏网络中其他节点之间的通信。 第一终端 可以是摄像头, 将摄像头作为 Ad Hoc 的上层节点, 拍摄对象 (比如衣 服)作为下层节点, 则根据 Ad Hoc 网络的结构特点, 一个上层节点 (即 摄像头) 可以对应于多个下层节点 (即多个上述的信息收发装置) , 并且 不同网络节点之间互不影响, 提高了视频采集系统的稳定性与灵活性。
在上述技术方案中, 优选地, 还包括: 所述第一终端还接收所述至少 一个所述拍摄对象实体发送的对应于其自身的可控信息; 其中, 所述第一 终端将所述可控信息与所述识别信息关联地编码至所述视频数据, 且所述 第二终端还从所述视频数据中获取与至少一个所述拍摄对象相关联的可控 信息, 并当检测到对所述指定操作区域的所述操作动作时, 根据所述可控 信息执行对所述指定拍摄对象的操作功能; 或当所述第二终端检测到对所 述指定操作区域的操作动作, 并将检测结果上报至所述第一终端时, 所述 第一终端将对应于所述指定操作区域的可控信息发送至所述第二终端, 以 由所述第二终端根据所述可控信息执行对所述指定拍摄对象的操作功能。
在该技术方案中, 第二终端可以对所有的拍摄对象进行默认的处理操 作, 比如对所有被点击到的拍摄对象进行放大处理, 或是存储被点击到的 拍摄对象, 或是直接调用浏览器对被点击到的拍摄对象进行 "以图搜 图" 。 当然, 为了能够实现更多的处理操作方式, 可以通过将可控信息与 识别信息进行关联并编码至视频数据中, 则用户在对识别出的拍摄对象进 行操作时, 第二终端根据可控信息执行相应的功能。 具体来说, 可以将可 控信息编码至上述识别帧中, 或是单独编码为控制信息帧, 将识别帧 (还 可能包括控制信息帧) 与拍摄得到的视频文件进行整合形成视频数据。 第 二终端根据解析出的可控信息, 以执行相应的功能。 第二终端解析出可控 信息之后, 可以同相关联的识别信息一起保存至匹配数据库中, 则在用户 对识别出的拍摄对象进行操作时, 从匹配数据库中检索出与该指定物体的 识别信息关联的可控信息, 以执行对该拍摄对象的操作功能。
当然, 第一终端将可控信息编码至视频数据中时, 往往是将与视频数 据中的拍摄对象相关联的识别信息和可控信息一并发送至第二终端; 但为 了节约网络资源、 提高视频数据的传输速度, 则第一终端可以根据第二终 端上报的检测结果, 仅当某个拍摄对象对应的操作区域存在操作动作时, 才将对应的可控信息发送至第二终端, 这也有利于节省第二终端的存储空 间。
在上述技术方案中, 优选地, 所述可控信息包括: 菜单数据、 链接信 息、 控制命令; 以及所述操作功能相应地包括: 根据所述菜单数据生成并 展示对应的交互菜单、 打开所述链接信息、 执行所述控制命令。
在该技术方案中, 具体来说, 比如用户在通过手机观看购物视频时, 手机识别出了视频中的某一件衣服, 用户触屏点击该衣服的操作区域, 弹 出比如包含 "购买、 价格、 咨询" 的交互菜单, 或者直接链接至 "购买" 的页面中, 也可以是对该衣服图像的放大等处理, 以方便用户的进一步操 作。
本发明还提出了一种数据处理系统, 包括第一终端和第二终端, 所述 第一终端包括: 图像采集单元, 用于对至少一个拍摄对象实体进行图像采 集; 编码单元, 用于将采集到的图像和对应于至少一个所述拍摄对象实体 的识别信息进行编码, 形成视频数据; 视频数据发送单元, 用于将所述编 码单元形成的所述视频数据通过网络发送至所述第二终端; 所述第二终端 包括: 视频数据接收单元, 用于接收所述视频数据; 数据分离单元, 用于 对所述视频数据进行数据分离, 得到视频文件和与所述视频文件中的至少 一个拍摄对象相关联的识别信息; 识别单元, 用于根据所述识别信息识别 出视频文件中的至少一个拍摄对象; 操作区域生成单元, 根据识别出的所 述至少一个拍摄对象在所述视频文件中形成对应于至少一个所述拍摄对象 的操作区域; 视频播放单元, 用于播放所述视频文件; 操作动作检测单 元, 用于在所述视频播放单元播放所述视频文件时, 检测对指定操作区域 的操作动作; 处理单元, 用于在所述操作动作检测单元检测到对所述指定 操作区域的操作动作时, 执行与所述指定操作区域对应的指定拍摄对象相 关联的操作功能。
在该技术方案中, 通过对视频中拍摄对象的识别, 可以使用户在观看 视频时直接对识别出的物体进行操作, 提升了用户的体验。 这里的视频文 件可以是摄像头实时拍摄后通过有线或无线方式传输过来的视频, 也可以 是其他任意时刻拍摄的非实时的视频。 通过对视频中的拍摄对象的识别, 生成的对应的操作区域可以是该拍摄对象的显示边沿对应的区域, 或是将 该拍摄对象包含在其中的矩形区域等, 具体地, 该操作区域可以是透明 的, 也可以在一定条件下 (比如设置一个可显示出操作区域的视频播放模 式, 并进入该模式时) 进行显示。 由于视频是动态的, 因此, 当视频内的 拍摄对象发生移动 (主动地发生移动, 或由于镜头的移动而使得该拍摄对 象在终端屏幕上形成相对位置变化) 时, 对应的操作区域也应当相应地变 化, 从而使得用户直接对拍摄对象进行操作即可, 而无需特别关注该操作 区域的位置。
优选地, 视频数据可以是由第一终端获取后传输至第二终端的, 尤其 是可以由第一终端实时获取并通过网络传输至第二终端。 当第一终端在进 行拍摄的过程中, 获取被拍摄的拍摄对象的识别信息, 由第一终端将其与 拍摄的视频文件编码成视频数据, 从而无需第一终端对拍摄对象进行分析 和特征获取, 降低了对第一终端的要求, 也方便了第二终端对视频中的拍 摄对象进行识别。
在上述技术方案中, 优选地, 所述第一终端, 还包括: 信息接收单 元, 用于接收至少一个所述拍摄对象实体发送的对应于其自身的识别信 息, 以用于编码至所述视频数据中。
在该技术方案中, 识别信息可以是第一终端从拍摄对象实体处获取 的, 则有助于在识别信息与具体的拍摄对象实体之间建立实际上的关联, 便于执行对拍摄对象实体和相应的识别信息的管理工作。
在上述技术方案中, 优选地, 所述第二终端, 还包括: 预存储单元, 用于预存储识别特征; 其中, 所述识别单元将所述视频文件的图像帧中的 内容与所述预存储单元预存储的识别特征进行匹配, 以识别出所述视频文 件中的至少一个拍摄对象。
在该技术方案中, 在第二终端中或第二终端对应的云端存储空间内, 预存储一个或多个物体的识别特征, 从而在第二终端获取视频文件之后的 任意时刻、 或是播放视频文件 (预先获取或实时接收的) 的过程中, 将视 频的图像帧中的内容与预存储的识别特征进行匹配, 以识别出视频中的拍 摄对象。 由于采用了预存储的识别特征, 因而对于视频文件本身而言, 并 没有特殊的要求, 所有的视频文件都可以适用于该技术方案, 可以是第二 终端从网络上下载的、 从其他终端处获取的或是第二终端自己拍摄的, 具 有较强的通用性。 同时, 由于视频文件中的拍摄对象并不总是在变化, 因 此, 在对某个图像帧中的拍摄对象进行识别之后, 可以将该图像帧与其之 后的一个或多个图像帧中的像素信息进行比较, 以判断出是否发生拍摄对 象的变化, 若存在变化, 则可以进行识别, 否则无需再次识别, 有利于提 高识别效率, 降低对终端处理能力的要求。 其中, 预存储的识别特征, 筒单而言, 可以是物体的图像, 则可以根 据与视频文件中的画面进行比对, 以识别出该物体; 进一步地, 识别特征 还可以是一些特征参数, 比如对于 "衣服" , 可以包括 "前方存在开口, 左右存在对称的袖子" 等参数, 使得第二终端能够 "认识" 到 "衣服" 为 何物, 再加之需要识别的衣服自身的特征, 比如颜色、 大小、 款式等, 就 可以由第二终端实现对 "衣服" 的智能识别。 同时, 第二终端自身预存储 识别特征, 与其根据第一终端发送的识别信息, 两者并不矛盾, 可以仅用 其中的某一个进行对象识别, 也可以同时利用两者进行识别。
在上述技术方案中, 优选地, 所述数据分离单元, 包括: 帧提取子单 元, 用于从所述视频数据中提取识别帧, 并得到经提取所述识别帧后剩余 的所述视频文件; 帧解析子单元, 用于从所述识别帧中进一步提取出所述 识别信息, 以用于所述识别单元对所述视频文件的识别操作。
在该技术方案中, 可以在视频文件对应的数据流中间或两端添加包含 识别信息的识别帧。 为了实现对视频数据的分离, 在识别帧的帧头部分应 该包含类型标识, 用于第二终端对视频数据中的识别帧的类型进行识别, 当识别到上述类型标识后, 即判断该数据帧为识别帧, 具体比如识别帧头 主要是由特殊字符组成, 以用来标识识别帧。 然后, 第二终端继续解析其 他的如识别帧长度等信息, 以完整地确定对应的识别帧。 识别帧还应该包 括信息部分, 该信息部分中包含了拍摄对象的识别信息等, 以用于对视频 中的拍摄对象进行识别。 通过采用识别帧的方式, 能够方便地将识别信息 编码在视频数据中, 并方便地从视频数据中解析出识别帧, 从识别帧的信 息部分提取出拍摄对象的识别信息, 通过识别信息对视频文件中的拍摄对 象进行识别。
在上述技术方案中, 优选地, 还包括: 至少一个所述第一终端作为上 层节点, 所有的所述拍摄对象实体作为下层节点, 以形成 Ad Hoc分层式 网络结构。
在该技术方案中, Ad Hoc 分层式网络结构不需要依靠现有固定通信 网络基础设施, 并且能够迅速展开使用的网络体系。 网络中的各个网络节 点相互协作, 通过无线链路进行通信、 交换信息, 实现信息和服务的共 享。 网络节点能够动态地、 随意地、 频繁地进入和离开网络, 而常常不需 要事先示警或通知, 并且不会破坏网络中其他节点之间的通信。 第一终端 可以是摄像头, 将摄像头作为 Ad Hoc 的上层节点, 拍摄对象 (比如衣 服)作为下层节点, 则根据 Ad Hoc 网络的结构特点, 一个上层节点 (即 摄像头) 可以对应于多个下层节点 (即多个上述的信息收发装置) , 并且 不同网络节点之间互不影响, 提高了视频采集系统的稳定性与灵活性。
在上述技术方案中, 优选地, 所述第一终端还接收所述至少一个所述 拍摄对象实体发送的对应于其自身的可控信息; 其中, 所述编码单元还用 于将所述可控信息与所述识别信息关联地编码至所述视频数据, 且所述数 据分离单元还用于从所述视频数据中获取与至少一个所述拍摄对象相关联 的可控信息, 所述处理单元还用于在检测到对所述指定操作区域的所述操 作动作的情况下, 根据所述可控信息执行对所述指定拍摄对象的操作功 能; 或所述第二终端还在检测到对所述指定操作区域的操作动作时, 将检 测结果上报至所述第一终端, 且所述第一终端相应地将对应于所述指定操 作区域的可控信息发送至所述第二终端, 以由所述处理单元根据所述可控 信息执行对所述指定拍摄对象的操作功能。
在该技术方案中, 第二终端可以对所有的拍摄对象进行默认的处理操 作, 比如对所有被点击到的拍摄对象进行放大处理, 或是存储被点击到的 拍摄对象, 或是直接调用浏览器对被点击到的拍摄对象进行 "以图搜 图" 。 当然, 为了能够实现更多的处理操作方式, 可以通过将可控信息与 识别信息进行关联并编码至视频数据中, 则用户在对识别出的拍摄对象进 行操作时, 第二终端根据可控信息执行相应的功能。 具体来说, 可以将可 控信息编码至上述识别帧中, 或是单独编码为控制信息帧, 将识别帧 (还 可能包括控制信息帧) 与拍摄得到的视频文件进行整合形成视频数据。 第 二终端根据解析出的可控信息, 以执行相应的功能。 第二终端解析出可控 信息之后, 可以同相关联的识别信息一起保存至匹配数据库中, 则在用户 对识别出的拍摄对象进行操作时, 从匹配数据库中检索出与该指定物体的 识别信息关联的可控信息, 以执行对该拍摄对象的操作功能。
当然, 第一终端将可控信息编码至视频数据中时, 往往是将与视频数 据中的拍摄对象相关联的识别信息和可控信息一并发送至第二终端; 但为 了节约网络资源、 提高视频数据的传输速度, 则第一终端可以根据第二终 端上报的检测结果, 仅当某个拍摄对象对应的操作区域存在操作动作时, 才将对应的可控信息发送至第二终端, 这也有利于节省第二终端的存储空 间。
在上述技术方案中, 优选地, 所述数据分离单元分离出的所述可控信 息包括: 菜单数据、 链接信息、 控制命令; 以及所述处理单元执行的所述 操作功能相应地包括: 根据所述菜单数据生成并展示对应的交互菜单、 打 开所述链接信息、 执行所述控制命令。
在该技术方案中, 具体来说, 比如用户在通过手机观看购物视频时, 手机识别出了视频中的某一件衣服, 用户触屏点击该衣服的操作区域, 弹 出比如包含 "购买、 价格、 咨询" 的交互菜单, 或者直接链接至 "购买" 的页面中, 也可以是对该衣服图像的放大等处理, 以方便用户的进一步操 作。
通过以上技术方案, 可以对视频中的拍摄对象进行识别, 使用户在观 看视频时能够直接对视频中的拍摄对象进行操作, 而无需通过单独的网络 搜索等方式进行操作, 从而有利于筒化用户操作, 提升了用户体验。 附图说明
图 1示出了根据本发明的实施例的数据处理方法的流程图;
图 2示出了根据本发明的实施例的数据处理系统的框图;
图 3示出了根据本发明的实施例基于 Ad Hoc网络结构的智能视频交 互系统;
图 4示出了根据本发明实施例的智能视频交互系统的流程图; 图 5A 至图 5C 示出了根据本发明的实施例智能视频交互系统的示意 图。 具体实施方式
为了能够更清楚地理解本发明的上述目的、 特征和优点, 下面结合附 图和具体实施方式对本发明进行进一步的详细描述。 需要说明的是, 在不 沖突的情况下, 本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本发明, 但是, 本发明还可以采用其他不同于在此描述的其他方式来实施, 因此, 本发明 的保护范围并不受下面公开的具体实施例的限制。
图 1示出了根据本发明的实施例的数据处理方法的流程图。
如图 1 所示, 根据本发明的实施例的数据处理方法, 包括: 步骤 102 , 第一终端对至少一个拍摄对象实体进行图像采集, 并将采集到的图 像和对应于至少一个所述拍摄对象实体的识别信息进行编码, 形成视频数 据, 并通过网络发送至第二终端; 步骤 104 , 所述第二终端接收所述视频 数据, 对所述视频数据进行数据分离, 得到视频文件和与所述视频文件中 的至少一个拍摄对象相关联的识别信息; 步骤 106 , 所述第二终端根据所 述识别信息识别出所述视频文件中的至少一个拍摄对象, 并在所述视频文 件中形成对应于至少一个所述拍摄对象的操作区域; 步骤 108 , 所述第二 终端在播放所述视频文件时, 根据检测到的对指定操作区域的操作动作, 执行与所述指定操作区域对应的指定拍摄对象相关联的操作功能。
在该技术方案中, 通过对视频中拍摄对象的识别, 可以使用户在观看 视频时直接对识别出的物体进行操作, 提升了用户的体验。 这里的视频文 件可以是摄像头实时拍摄后通过有线或无线方式传输过来的视频, 也可以 是其他任意时刻拍摄的非实时的视频。 通过对视频中的拍摄对象的识别, 生成的对应的操作区域可以是该拍摄对象的显示边沿对应的区域, 或是将 该拍摄对象包含在其中的矩形区域等, 具体地, 该操作区域可以是透明 的, 也可以在一定条件下 (比如设置一个可显示出操作区域的视频播放模 式, 并进入该模式时) 进行显示。 由于视频是动态的, 因此, 当视频内的 拍摄对象发生移动 (主动地发生移动, 或由于镜头的移动而使得该拍摄对 象在终端屏幕上形成相对位置变化) 时, 对应的操作区域也应当相应地变 化, 从而使得用户直接对拍摄对象进行操作即可, 而无需特别关注该操作 区域的位置。
优选地, 视频数据可以是由第一终端获取后传输至第二终端的, 尤其 是可以由第一终端实时获取并通过网络传输至第二终端。 当第一终端在进 行拍摄的过程中, 获取被拍摄的拍摄对象的识别信息, 由第一终端将其与 拍摄的视频文件编码成视频数据, 从而无需第一终端对拍摄对象进行分析 和特征获取, 降低了对第一终端的要求, 也方便了第二终端对视频中的拍 摄对象进行识别。
具体来说, 比如用户在通过手机、 电脑等终端设备观看视频时, 点击 (或其他方式, 比如将鼠标放置在拍摄对象对应的操作区域中) 了视频中 的某一件衣服, 如果这件衣服是被识别了的拍摄对象, 则可以实现对应的 操作, 比如链接至一个网页 (调用浏览器并切换至浏览器界面, 或是以气 泡框的形式显示在视频播放界面) , 该网页为这件衣服的品牌信息和 /或 购买信息; 再比如在视频的播放界面上弹出包含 "购买、 价格、 咨询"
(用于举例, 也可以包含其他信息) 的菜单, 用户可以通过对菜单的选择 操作, 实现进一步控制操作。
此外, 拍摄对象实体对应于存储装置和信息收发装置, 其中, 存储装 置中存储了该拍摄对象实体的识别信息, 是预先存储在该存储装置中的, 而信息收发装置则用于将该识别信息发送至第一终端。 而第一终端对于拍 摄对象实体的识别信息进行获取时, 可以通过发送识别信息获取指令, 则 接收到该指令的信息收发装置就将对应的识别信息发送给第一终端。 存储 装置和信息收发装置可以位于拍摄对象实体中, 比如该拍摄对象实体为智 能手机; 存储装置和信息收发装置也可以是与拍摄对象实体相关联的, 比 如是连接至该拍摄对象实体的, 或是放置在拍摄对象实体附近, 或是由于 存储装置中包含了某个拍摄对象实体的识别信息、 且信息收发装置用于发 送该识别信息, 就认为该拍摄对象实体与存储装置、 信息收发装置是相关 联的。
进一步地, 一个存储装置可以对应于一个或多个拍摄对象实体, 而一 个信息收发装置也可以对应于一个或多个存储装置。 信息收发装置在接收 到第一终端发出的识别信息获取指令时, 可以将其关联的存储装置中的所 有识别信息都发送给第一终端; 也可以通过设置另一个图像采集设备, 其 通过监测第一终端的实时状态, 确定其拍摄到的拍摄对象实体, 从而信息 收发装置仅将这部分能够被拍摄到的拍摄对象实体的识别信息发送给第一 终端, 从而减少了第一终端需要处理的数据量。
在上述技术方案中, 优选地, 还包括: 所述第一终端接收至少一个所 述拍摄对象实体发送的对应于其自身的识别信息, 以用于编码至所述视频 数据中。
在该技术方案中, 识别信息可以是第一终端从拍摄对象实体处获取 的, 则有助于在识别信息与具体的拍摄对象实体之间建立实际上的关联, 便于执行对拍摄对象实体和相应的识别信息的管理工作。
在上述技术方案中, 优选地, 还包括: 所述第二终端将所述视频文件 的图像帧中的内容与预存储的识别特征进行匹配, 以识别出所述视频文件 中的至少一个拍摄对象。
在该技术方案中, 在第二终端中或第二终端对应的云端存储空间内, 预存储一个或多个物体的识别特征, 从而在第二终端获取视频文件之后的 任意时刻、 或是播放视频文件 (预先获取或实时接收的) 的过程中, 将视 频的图像帧中的内容与预存储的识别特征进行匹配, 以识别出视频中的拍 摄对象。 由于采用了预存储的识别特征, 因而对于视频文件本身而言, 并 没有特殊的要求, 所有的视频文件都可以适用于该技术方案, 可以是第二 终端从网络上下载的、 从其他终端处获取的或是第二终端自己拍摄的, 具 有较强的通用性。 同时, 由于视频文件中的拍摄对象并不总是在变化, 因 此, 在对某个图像帧中的拍摄对象进行识别之后, 可以将该图像帧与其之 后的一个或多个图像帧中的像素信息进行比较, 以判断出是否发生拍摄对 象的变化, 若存在变化, 则可以进行识别, 否则无需再次识别, 有利于提 高识别效率, 降低对终端处理能力的要求。
其中, 预存储的识别特征, 筒单而言, 可以是物体的图像, 则可以根 据与视频文件中的画面进行比对, 以识别出该物体; 进一步地, 识别特征 还可以是一些特征参数, 比如对于 "衣服" , 可以包括 "前方存在开口, 左右存在对称的袖子" 等参数, 使得第二终端能够 "认识" 到 "衣服" 为 何物, 再加之需要识别的衣服自身的特征, 比如颜色、 大小、 款式等, 就 可以由第二终端实现对 "衣服" 的智能识别。 同时, 第二终端自身预存储 识别特征, 与其根据第一终端发送的识别信息, 两者并不矛盾, 可以仅用 其中的某一个进行对象识别, 也可以同时利用两者进行识别。
在上述技术方案中, 优选地, 所述第二终端对所述视频数据进行数据 分离的过程包括: 解析所述视频数据, 从所述视频数据中提取识别帧, 并 得到经提取所述识别帧后剩余的所述视频文件; 从所述识别帧中进一步提 取出所述识别信息, 以用于对所述视频文件的识别操作。
在该技术方案中, 可以在视频文件对应的数据流中间或两端添加包含 识别信息的识别帧。 为了实现对视频数据的分离, 在识别帧的帧头部分应 该包含类型标识, 用于第二终端对视频数据中的识别帧的类型进行识别, 当识别到上述类型标识后, 即判断该数据帧为识别帧, 具体比如识别帧头 主要是由特殊字符组成, 以用来标识识别帧。 然后, 第二终端继续解析其 他的如识别帧长度等信息, 以完整地确定对应的识别帧。 识别帧还应该包 括信息部分, 该信息部分中包含了拍摄对象的识别信息等, 以用于对视频 中的拍摄对象进行识别。 通过采用识别帧的方式, 能够方便地将识别信息 编码在视频数据中, 并方便地从视频数据中解析出识别帧, 从识别帧的信 息部分提取出拍摄对象的识别信息, 通过识别信息对视频文件中的拍摄对 象进行识别。
在上述技术方案中, 优选地, 还包括: 至少一个所述第一终端作为上 层节点, 所有的所述拍摄对象实体作为下层节点, 以形成 Ad Hoc分层式 网络结构。
在该技术方案中, Ad Hoc 分层式网络结构不需要依靠现有固定通信 网络基础设施, 并且能够迅速展开使用的网络体系。 网络中的各个网络节 点相互协作, 通过无线链路进行通信、 交换信息, 实现信息和服务的共 享。 网络节点能够动态地、 随意地、 频繁地进入和离开网络, 而常常不需 要事先示警或通知, 并且不会破坏网络中其他节点之间的通信。 第一终端 可以是摄像头, 将摄像头作为 Ad Hoc 的上层节点, 拍摄对象 (比如衣 服)作为下层节点, 则根据 Ad Hoc 网络的结构特点, 一个上层节点 (即 摄像头) 可以对应于多个下层节点 (即多个上述的信息收发装置) , 并且 不同网络节点之间互不影响, 提高了视频采集系统的稳定性与灵活性。 在上述技术方案中, 优选地, 还包括: 所述第一终端还接收所述至少 一个所述拍摄对象实体发送的对应于其自身的可控信息; 其中, 所述第一 终端将所述可控信息与所述识别信息关联地编码至所述视频数据, 且所述 第二终端还从所述视频数据中获取与至少一个所述拍摄对象相关联的可控 信息, 并当检测到对所述指定操作区域的所述操作动作时, 根据所述可控 信息执行对所述指定拍摄对象的操作功能; 或当所述第二终端检测到对所 述指定操作区域的操作动作, 并将检测结果上报至所述第一终端时, 所述 第一终端将对应于所述指定操作区域的可控信息发送至所述第二终端, 以 由所述第二终端根据所述可控信息执行对所述指定拍摄对象的操作功能。
在该技术方案中, 第二终端可以对所有的拍摄对象进行默认的处理操 作, 比如对所有被点击到的拍摄对象进行放大处理, 或是存储被点击到的 拍摄对象, 或是直接调用浏览器对被点击到的拍摄对象进行 "以图搜 图" 。 当然, 为了能够实现更多的处理操作方式, 可以通过将可控信息与 识别信息进行关联并编码至视频数据中, 则用户在对识别出的拍摄对象进 行操作时, 第二终端根据可控信息执行相应的功能。 具体来说, 可以将可 控信息编码至上述识别帧中, 或是单独编码为控制信息帧, 将识别帧 (还 可能包括控制信息帧) 与拍摄得到的视频文件进行整合形成视频数据。 第 二终端根据解析出的可控信息, 以执行相应的功能。 第二终端解析出可控 信息之后, 可以同相关联的识别信息一起保存至匹配数据库中, 则在用户 对识别出的拍摄对象进行操作时, 从匹配数据库中检索出与该指定物体的 识别信息关联的可控信息, 以执行对该拍摄对象的操作功能。
当然, 第一终端将可控信息编码至视频数据中时, 往往是将与视频数 据中的拍摄对象相关联的识别信息和可控信息一并发送至第二终端; 但为 了节约网络资源、 提高视频数据的传输速度, 则第一终端可以根据第二终 端上报的检测结果, 仅当某个拍摄对象对应的操作区域存在操作动作时, 才将对应的可控信息发送至第二终端, 这也有利于节省第二终端的存储空 间。
在上述技术方案中, 优选地, 所述可控信息包括: 菜单数据、 链接信 息、 控制命令; 以及所述操作功能相应地包括: 根据所述菜单数据生成并 展示对应的交互菜单、 打开所述链接信息、 执行所述控制命令。
在该技术方案中, 具体来说, 比如用户在通过手机观看购物视频时, 手机识别出了视频中的某一件衣服, 用户触屏点击该衣服的操作区域, 弹 出比如包含 "购买、 价格、 咨询" 的交互菜单, 或者直接链接至 "购买" 的页面中, 也可以是对该衣服图像的放大等处理, 以方便用户的进一步操 作。
图 2示出了根据本发明的实施例的数据处理系统的框图。
如图 2所示, 根据本发明的实施例的数据处理系统 200, 包括第一终 端 202和第二终端 204 , 所述第一终端 202包括: 图像采集单元 202A, 用 于对至少一个拍摄对象实体进行图像采集; 编码单元 202B , 用于将采集 到的图像和对应于至少一个所述拍摄对象实体的识别信息进行编码, 形成 视频数据; 视频数据发送单元 202C, 用于将所述编码单元 202B形成的所 述视频数据通过网络发送至所述第二终端 204; 所述第二终端 204 包括: 视频数据接收单元 204A, 用于接收所述视频数据; 数据分离单元 204B , 用于对所述视频数据进行数据分离, 得到视频文件和与所述视频文件中的 至少一个拍摄对象相关联的识别信息; 识别单元 204C , 用于根据所述识 别信息识别出视频文件中的至少一个拍摄对象; 操作区域生成单元 204D , 根据识别出的所述至少一个拍摄对象在所述视频文件中形成对应于 至少一个所述拍摄对象的操作区域; 视频播放单元 204E , 用于播放所述 视频文件; 操作动作检测单元 204F, 用于在所述视频播放单元 204E播放 所述视频文件时, 检测对指定操作区域的操作动作; 处理单元 204G, 用 于在所述操作动作检测单元 204F 检测到对所述指定操作区域的操作动作 时, 执行与所述指定操作区域对应的指定拍摄对象相关联的操作功能。
在该技术方案中, 通过对视频中拍摄对象的识别, 可以使用户在观看 视频时直接对识别出的物体进行操作, 提升了用户的体验。 这里的视频文 件可以是摄像头实时拍摄后通过有线或无线方式传输过来的视频, 也可以 是其他任意时刻拍摄的非实时的视频。 通过对视频中的拍摄对象的识别, 生成的对应的操作区域可以是该拍摄对象的显示边沿对应的区域, 或是将 该拍摄对象包含在其中的矩形区域等, 具体地, 该操作区域可以是透明 的, 也可以在一定条件下 (比如设置一个可显示出操作区域的视频播放模 式, 并进入该模式时) 进行显示。 由于视频是动态的, 因此, 当视频内的 拍摄对象发生移动 (主动地发生移动, 或由于镜头的移动而使得该拍摄对 象在终端屏幕上形成相对位置变化) 时, 对应的操作区域也应当相应地变 化, 从而使得用户直接对拍摄对象进行操作即可, 而无需特别关注该操作 区域的位置。
优选地, 视频数据可以是由第一终端 202获取后传输至第二终端 204 的, 尤其是可以由第一终端 202 实时获取并通过网络传输至第二终端 204。 当第一终端 202 在进行拍摄的过程中, 获取被拍摄的拍摄对象的识 别信息, 由第一终端 202将其与拍摄的视频文件编码成视频数据, 从而无 需第一终端 202对拍摄对象进行分析和特征获取, 降低了对第一终端 202 的要求, 也方便了第二终端 204对视频中的拍摄对象进行识别。
具体来说, 比如用户在通过手机、 电脑等终端设备观看视频时, 点击 (或其他方式, 比如将鼠标放置在拍摄对象对应的操作区域中) 了视频中 的某一件衣服, 如果这件衣服是被识别了的拍摄对象, 则可以实现对应的 操作, 比如链接至一个网页 (调用浏览器并切换至浏览器界面, 或是以气 泡框的形式显示在视频播放界面) , 该网页为这件衣服的品牌信息和 /或 购买信息; 再比如在视频的播放界面上弹出包含 "购买、 价格、 咨询" (用于举例, 也可以包含其他信息) 的菜单, 用户可以通过对菜单的选择 操作, 实现进一步控制操作。
此外, 拍摄对象实体对应于存储装置和信息收发装置 (图中未示 出) , 其中, 存储装置中存储了该拍摄对象实体的识别信息, 是预先存储 在该存储装置中的, 而信息收发装置则用于将该识别信息发送至第一终端
202。 而第一终端 202 对于拍摄对象实体的识别信息进行获取时, 可以通 过发送识别信息获取指令, 则接收到该指令的信息收发装置就将对应的识 别信息发送给第一终端 202。 存储装置和信息收发装置可以位于拍摄对象 实体中, 比如该拍摄对象实体为智能手机; 存储装置和信息收发装置也可 以是与拍摄对象实体相关联的, 比如是连接至该拍摄对象实体的, 或是放 置在拍摄对象实体附近, 或是由于存储装置中包含了某个拍摄对象实体的 识别信息、 且信息收发装置用于发送该识别信息, 就认为该拍摄对象实体 与存储装置、 信息收发装置是相关联的。
进一步地, 一个存储装置可以对应于一个或多个拍摄对象实体, 而一 个信息收发装置也可以对应于一个或多个存储装置。 信息收发装置在接收 到第一终端 202发出的识别信息获取指令时, 可以将其关联的存储装置中 的所有识别信息都发送给第一终端 202; 也可以通过设置另一个图像采集 设备, 其通过监测第一终端 202的实时状态, 确定其拍摄到的拍摄对象实 体, 从而信息收发装置仅将这部分能够被拍摄到的拍摄对象实体的识别信 息发送给第一终端 202 , 从而减少了第一终端 202需要处理的数据量。
在上述技术方案中, 优选地, 所述第一终端 202 , 还包括: 信息接收 单元 202D , 用于接收至少一个所述拍摄对象实体发送的对应于其自身的 识别信息, 以用于编码至所述视频数据中。
在该技术方案中, 识别信息可以是第一终端 202从拍摄对象实体处获 取的, 则有助于在识别信息与具体的拍摄对象实体之间建立实际上的关 联, 便于执行对拍摄对象实体和相应的识别信息的管理工作。
在上述技术方案中, 优选地, 所述第二终端 204 , 还包括: 预存储单 元 (图中未示出) , 用于预存储识别特征; 其中, 所述识别单元 204C 将 所述视频文件的图像帧中的内容与所述预存储单元预存储的识别特征进行 匹配, 以识别出所述视频文件中的至少一个拍摄对象。
在该技术方案中, 在第二终端 204中或第二终端 204对应的云端存储 空间内, 预存储一个或多个物体的识别特征, 从而在第二终端 204获取视 频文件之后的任意时刻、 或是播放视频文件 (预先获取或实时接收的) 的 过程中, 将视频的图像帧中的内容与预存储的识别特征进行匹配, 以识别 出视频中的拍摄对象。 由于采用了预存储的识别特征, 因而对于视频文件 本身而言, 并没有特殊的要求, 所有的视频文件都可以适用于该技术方 案, 可以是第二终端 204从网络上下载的、 从其他终端处获取的或是第二 终端 204 自己拍摄的, 具有较强的通用性。 同时, 由于视频文件中的拍摄 对象并不总是在变化, 因此, 在对某个图像帧中的拍摄对象进行识别之 后, 可以将该图像帧与其之后的一个或多个图像帧中的像素信息进行比 较, 以判断出是否发生拍摄对象的变化, 若存在变化, 则可以进行识别, 否则无需再次识别, 有利于提高识别效率, 降低对终端处理能力的要求。
其中, 预存储的识别特征, 筒单而言, 可以是物体的图像, 则可以根 据与视频文件中的画面进行比对, 以识别出该物体; 进一步地, 识别特征 还可以是一些特征参数, 比如对于 "衣服" , 可以包括 "前方存在开口, 左右存在对称的袖子" 等参数, 使得第二终端 204 能够 "认识" 到 "衣 服" 为何物, 再加之需要识别的衣服自身的特征, 比如颜色、 大小、 款式 等, 就可以由第二终端 204实现对 "衣服" 的智能识别。 同时, 第二终端 204 自身预存储识别特征, 与其根据第一终端 202发送的识别信息, 两者 并不矛盾, 可以仅用其中的某一个进行对象识别, 也可以同时利用两者进 行识别。
在上述技术方案中, 优选地, 所述数据分离单元 204B , 包括: 帧提 取子单元 (图中未示出) , 用于从所述视频数据中提取识别帧, 并得到经 提取所述识别帧后剩余的所述视频文件; 帧解析子单元 (图中未示出) , 用于从所述识别帧中进一步提取出所述识别信息, 以用于所述识别单元对 所述视频文件的识别操作。
在该技术方案中, 可以在视频文件对应的数据流中间或两端添加包含 识别信息的识别帧。 为了实现对视频数据的分离, 在识别帧的帧头部分应 该包含类型标识, 用于第二终端 204对视频数据中的识别帧的类型进行识 别, 当识别到上述类型标识后, 即判断该数据帧为识别帧, 具体比如识别 帧头主要是由特殊字符组成, 以用来标识识别帧。 然后, 第二终端 204继 续解析其他的如识别帧长度等信息, 以完整地确定对应的识别帧。 识别帧 还应该包括信息部分, 该信息部分中包含了拍摄对象的识别信息等, 以用 于对视频中的拍摄对象进行识别。 通过采用识别帧的方式, 能够方便地将 识别信息编码在视频数据中, 并方便地从视频数据中解析出识别帧, 从识 别帧的信息部分提取出拍摄对象的识别信息, 通过识别信息对视频文件中 的拍摄对象进行识别。
在上述技术方案中, 优选地, 还包括: 至少一个所述第一终端 202作 为上层节点, 所有的所述拍摄对象实体作为下层节点, 以形成 Ad Hoc分 层式网络结构。
在该技术方案中, Ad Hoc 分层式网络结构不需要依靠现有固定通信 网络基础设施, 并且能够迅速展开使用的网络体系。 网络中的各个网络节 点相互协作, 通过无线链路进行通信、 交换信息, 实现信息和服务的共 享。 网络节点能够动态地、 随意地、 频繁地进入和离开网络, 而常常不需 要事先示警或通知, 并且不会破坏网络中其他节点之间的通信。 第一终端 202 可以是摄像头, 将摄像头作为 Ad Hoc 的上层节点, 拍摄对象(比如 衣服) 作为下层节点, 则根据 Ad Hoc 网络的结构特点, 一个上层节点 (即摄像头) 可以对应于多个下层节点 (即多个上述的信息收发装置) , 并且不同网络节点之间互不影响, 提高了视频采集系统的稳定性与灵活 性。
在上述技术方案中, 优选地, 所述第一终端 202还接收所述至少一个 所述拍摄对象实体发送的对应于其自身的可控信息; 其中, 所述编码单元 202B 还用于将所述可控信息与所述识别信息关联地编码至所述视频数 据, 且所述数据分离单元 204B 还用于从所述视频数据中获取与至少一个 所述拍摄对象相关联的可控信息, 所述处理单元 204G 还用于在检测到对 所述指定操作区域的所述操作动作的情况下, 根据所述可控信息执行对所 述指定拍摄对象的操作功能; 或所述第二终端 204还在检测到对所述指定 操作区域的操作动作时, 将检测结果上报至所述第一终端 202 , 且所述第 一终端 202相应地将对应于所述指定操作区域的可控信息发送至所述第二 终端 204, 以由所述处理单元 204G 根据所述可控信息执行对所述指定拍 摄对象的操作功能。
在该技术方案中, 第二终端 204可以对所有的拍摄对象进行默认的处 理操作, 比如对所有被点击到的拍摄对象进行放大处理, 或是存储被点击 到的拍摄对象, 或是直接调用浏览器对被点击到的拍摄对象进行 "以图搜 图" 。 当然, 为了能够实现更多的处理操作方式, 可以通过将可控信息与 识别信息进行关联并编码至视频数据中, 则用户在对识别出的拍摄对象进 行操作时, 第二终端 204根据可控信息执行相应的功能。 具体来说, 可以 将编码至上述识别帧中, 或是单独编码为控制信息帧, 将识别帧 (还可能 包括控制信息帧) 与拍摄得到的视频文件进行整合形成视频数据。 第二终 端 204根据解析出的可控信息, 以执行相应的功能。 第二终端 204解析出 可控信息之后, 可以同相关联的识别信息一起保存至匹配数据库中, 则在 用户对识别出的拍摄对象进行操作时, 从匹配数据库中检索出与该指定物 体的识别信息关联的可控信息, 以执行对该拍摄对象的操作功能。
当然, 第一终端 202将可控信息编码至视频数据中时, 往往是将与视 频数据中的拍摄对象相关联的识别信息和可控信息一并发送至第二终端 204; 但为了节约网络资源、 提高视频数据的传输速度, 则第一终端 202 可以根据第二终端 204上报的检测结果, 仅当某个拍摄对象对应的操作区 域存在操作动作时, 才将对应的可控信息发送至第二终端 204 , 这也有利 于节省第二终端 204的存储空间。
在上述技术方案中, 优选地, 所述数据分离单元 204B 分离出的所述 可控信息包括: 菜单数据、 链接信息、 控制命令; 以及所述处理单元 204G 执行的所述操作功能相应地包括: 根据所述菜单数据生成并展示对 应的交互菜单、 打开所述链接信息、 执行所述控制命令。
在该技术方案中, 具体来说, 比如用户在通过手机观看购物视频时, 手机识别出了视频中的某一件衣服, 用户触屏点击该衣服的操作区域, 弹 出比如包含 "购买、 价格、 咨询" 的交互菜单, 或者直接链接至 "购买" 的页面中, 也可以是对该衣服图像的放大等处理, 以方便用户的进一步操 作。
图 3示出了根据本发明实施例的基于 Ad Hoc网络结构的智能视频交 互系统的模块图。
如图 3所示, 根据本发明实施例的基于 Ad Hoc网络结构的智能视频 交互系统, 包括客户端 302与服务端 304。
本实施例中服务端 304采用的是 Ad Hoc分层式网络结构来进行信息 的采集, 以形成视频数据供客户端 302进行下载, 且客户端 302可以根据 需要实时播放或在其他任意时刻播放。 Ad Hoc 网络中的各个网络节点相 互协作, 通过无线链路进行通信、 交换信息, 实现信息和服务的共享。 网 络节点能够动态地、 随意地、 频繁地进入和离开网络, 而常常不需要事先 示警或通知, 并且不会破坏网络中其他节点之间的通信, 具有^艮强的灵活 性。 当然, 采用 Ad Hoc 网络结构只是一种较为优选的方式, 若采用其他 的网络结构以实现本发明中信息的采集过程, 也应包含在本发明的保护范 围之内。
服务端 304包括:
服务器 304A, 用于提供客户端 302 下载视频数据, 其中的视频数据 可以是包含有识别帧的视频数据, 也可以是不含识别帧的视频文件。 服务 器 304A 可以根据客户端的不同选择, 传输上述两种视频数据中的任一 种。
上层节点 304B与上层节点 304C是 Ad Hoc网络中的上层节点 (显然 地, 上层节点的数量是可以根据需要而变化的, 即可以仅包含一个上层节 点, 也可以包含 2 个或更多上层节点, 此处以包含 2 个节点为例进行说 明) , 节点之间互不影响, 可以动态地、 随意地、 频繁地进入和离开网 络, 使信息采集系统具有很强的灵活性。 上层节点在此可以是摄像头, 用 于根据服务器 304A 的请求动态采集拍摄对象 (即下层节点) 的图像信 息。 上层节点对于下层节点的识别信息和 /或可控信息进行获取时, 可以 通过发送识别信息和 /或可控信息获取指令, 则下层节点接收到该指令就 将对应的识别信息和 /或可控信息发送至上层节点。 其中一个上层节点可 以对应于多个下层节点。 如上层节点 304B 对应于下层节点 304D 与 304E, 下层节点 304D与 304E之间也是互不影响的。
下层节点 304D、 304E、 304F、 304G是 Ad Hoc网络中的下层节点, 与上层节点一样, 可以动态地、 随意地、 频繁地进入和离开网络, 并不影 响其他网络节点的工作。 当下层节点接收到上层节点发送的获取识别信息 和 /或可控信息的命令时, 传输识别信息与可控信息至上层节点。
客户端 302包括:
接收模块 302A, 用于接收从服务端获取的视频数据, 所述视频数据 中包含了识别拍摄对象的识别信息。
数据分离模块 302B , 用于对所述视频数据进行数据分离, 得到所述 视频文件和与所述视频文件中的至少一个拍摄对象相关联的识别信息, 以 及与识别信息关联的可控信息。 具体来说, 视频数据中包含有识别帧, 识 别帧包含识别帧头、 识别帧长度、 识别帧信息等特征。 识别帧头主要是由 特殊字符组成, 以用来标识识别帧; 识别帧长度用来标记识别帧信息的长 度; 识别帧信息部分是由特殊的字符编码格式组成, 包含了拍摄对象的识 别信息和可控信息等。 因此可以将识别帧从视频数据中分离出来, 并解析 识别帧, 从识别帧的信息部分提取出拍摄对象的识别信息与可控信息, 通 过识别信息对视频文件中的拍摄对象进行识别。
视频解码模块 302C, 用于对视频文件进行解码。
音视频输出模块 302D, 用于将解码后的音视频输出进行播放。
匹对数据库 302E , 用于保存从视频数据中分离出的识别信息与可控 信息。
智能识别模块 302F , 用于根据分离出的识别信息对视频文件中的拍 摄对象进行识别, 并根据识别出的拍摄对象生成对应的操作区域。
智能交互显示模块 302G, 用于在播放视频文件时, 在识别出的拍摄 对象的操作区域内对拍摄对象进行操作时, 根据分离出的可控信息, 执行 相应的操作。
下面结合图 4和图 5A-5C, 对本发明的技术方案进行详细说明。
图 4示出了根据本发明实施例的智能视频交互系统的流程图。
如图 4所示, 根据本发明实施例的智能视频交互系统的流程, 包括: 步骤 402 , 用户选择相应的视频文件进行播放, 即选择包含有数据信 息的视频数据, 或者是单纯的视频文件。
步骤 404, 当用户想了解某个物体 (拍摄对象) 的具体信息时, 可通 过点击某个物体。 本实施例中, 用户首先对视频中的指定物体进行操作 (即点击, 当然也可以通过其他操作, 比如触屏) , 然后再判断该指定物 体是否为可以识别的拍摄对象。 当然也可以先通过对拍摄对象的识别, 识 别出拍摄对象之后, 将拍摄对象进行特殊显示, 再由用户对识别出的拍摄 对象进行操作。
步骤 406, 判断用户选择何种视频模式进行播放, 若选择特殊模式, 则执行步骤 408 , 否则跳转至步骤 402。 在本实施例中, 用户可以选择视 频模式, 其中的特殊模式即为本发明技术方案中所述的可以对拍摄对象进 行识别, 并在视频播放过程中, 支持用户对识别出的拍摄对象进行操作的 模式。 用户若选择特殊模式, 则针对包含有数据信息的视频数据, 可以对 视频数据进行分离, 得到拍摄对象的识别信息与可控信息, 以对拍摄对象 进行识别与操作; 若所播放的视频是不含数据信息的视频文件, 则可以通 过终端在本地或云端存储的识别特征来对拍摄对象进行识别。 若用户选择 的不是特殊模式, 则只能够进行视频播放, 不能对视频中的拍摄对象进行 操作。
步骤 408 , 根据选择的内容, 弹出交互菜单进行动态的交互。 弹出的 交互菜单是根据可控信息做出的相应操作。
如图 5A所示, 在手机终端 (也可以是平板电脑、 PC等其他终端)播 放视频的过程中, 从视频数据中分离出的识别信息和与识别信息相关联的 可控信息保存至匹配数据库中, 根据识别信息 (或者是存储至本地或云端 的识别特征 ) 识别出拍摄对象 502 , 并可以对识别出的拍摄对象 502进行 特殊显示 (比如显示出一个高亮范围等) , 在拍摄对象 502的附近生成对 应于拍摄对象 502的操作区域(图中未示出) 。 用户可以点击拍摄对象的 操作区域来对拍摄对象 502进行操作, 终端根据对拍摄对象 502的操作检 索匹配数据中的可控信息, 执行相应的操作, 如图中所示, 弹出交互菜单 504 , 用户可以通过交互菜单 504 对拍摄对象 502 作进一步的操作。 当 然, 也可以如图 5B 所示, 在点击拍摄对象 502 后, 弹出一个气泡框 506, 从气泡框 506 中可以获知拍摄对象 502 的信息。 也可以在点击拍摄 对象 502之后, 对拍摄对象 502进行放大显示, 或者调用浏览器并直接切 换至相应的网址链接的页面中 (如图 5C所示) 。
步骤 410 , 用户选择某个菜单, 如图 5A中的 "详细" 。
步骤 412 , 把用户选择的操作信息发至指定的服务器, 根据识别到的 操作功能, 通过将选择的操作信息发送至服务器, 可以根据存储的操作功 能作出对操作信息的响应。
步骤 414, 服务器返回操作结果, 比如, 可以弹出如图 5B 中所示的 拍摄对象 502详细信息的气泡框。
以上结合附图详细说明了本发明的技术方案, 考虑到在现有技术中, 当用户进行网上购物时, 通过浏览网页图片的方式去购买产品, 购买到的 实物跟网上的照片偏差较大, 用户在观看视频时, 也无法对视频中的拍摄 对象进行操作, 只能通过单独的网络搜索等方式对拍摄对象进行操作。 因 此, 本发明提出了一种新的数据处理方案, 可以对视频中的拍摄对象进行 识别, 使用户在观看视频时对视频中的拍摄对象进行操作, 而无需通过单 独的网络搜索等方式进行操作, 从而有利于筒化用户操作, 提升了用户体 验。
以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于 本领域的技术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精 神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明 的保护范围之内。

Claims

权 利 要 求 书
1. 一种数据处理方法, 包括:
第一终端对至少一个拍摄对象实体进行图像采集, 并将采集到的图像 和对应于至少一个所述拍摄对象实体的识别信息进行编码, 形成视频数 据, 并通过网络发送至第二终端;
所述第二终端接收所述视频数据, 对所述视频数据进行数据分离, 得 到视频文件和与所述视频文件中的至少一个拍摄对象相关联的识别信息; 所述第二终端根据所述识别信息识别出所述视频文件中的至少一个拍 摄对象, 并在所述视频文件中形成对应于至少一个所述拍摄对象的操作区 域;
所述第二终端在播放所述视频文件时, 根据检测到的对指定操作区域 的操作动作, 执行与所述指定操作区域对应的指定拍摄对象相关联的操作 功能。
2. 根据权利要求 1所述的数据处理方法, 还包括:
所述第一终端接收至少一个所述拍摄对象实体发送的对应于其自身的 识别信息, 以用于编码至所述视频数据中。
3. 根据权利要求 1所述的数据处理方法, 还包括:
至少一个所述第一终端作为上层节点, 所有的所述拍摄对象实体作为 下层节点, 以形成 Ad Hoc分层式网络结构。
4. 根据权利要求 1至 3中任一项所述的数据处理方法, 还包括: 所述第一终端还接收所述至少一个所述拍摄对象实体发送的对应于其 自身的可控信息;
其中, 所述第一终端将所述可控信息与所述识别信息关联地编码至所 述视频数据, 且所述第二终端还从所述视频数据中获取与至少一个所述拍 摄对象相关联的可控信息, 并当检测到对所述指定操作区域的所述操作动 作时, 根据所述可控信息执行对所述指定拍摄对象的操作功能;
或当所述第二终端检测到对所述指定操作区域的操作动作, 并将检测 结果上报至所述第一终端时, 所述第一终端将对应于所述指定操作区域的 可控信息发送至所述第二终端, 以由所述第二终端根据所述可控信息执行 对所述指定拍摄对象的操作功能。
5. 根据权利要求 4 所述的数据处理方法, 所述可控信息包括: 菜单 数据、 链接信息、 控制命令; 以及
所述操作功能相应地包括:
根据所述菜单数据生成并展示对应的交互菜单、 打开所述链接信息、 执行所述控制命令。
6. 一种数据处理系统, 其特征在于, 包括第一终端和第二终端, 所述第一终端包括:
图像采集单元, 用于对至少一个拍摄对象实体进行图像采集; 编码单元, 用于将采集到的图像和对应于至少一个所述拍摄对象 实体的识别信息进行编码, 形成视频数据;
视频数据发送单元, 用于将所述编码单元形成的所述视频数据通 过网络发送至所述第二终端;
所述第二终端包括:
视频数据接收单元, 用于接收所述视频数据;
数据分离单元, 用于对所述视频数据进行数据分离, 得到视频文 件和与所述视频文件中的至少一个拍摄对象相关联的识别信息;
识别单元, 用于根据所述识别信息识别出所述视频文件中的至少 一个拍摄对象;
操作区域生成单元, 根据识别出的所述至少一个拍摄对象在所述 视频文件中形成对应于至少一个所述拍摄对象的操作区域;
视频播放单元, 用于播放所述视频文件;
操作动作检测单元, 用于在所述视频播放单元播放所述视频文件 时, 检测对指定操作区域的操作动作;
处理单元, 用于在所述操作动作检测单元检测到对所述指定操作 区域的操作动作时, 执行与所述指定操作区域对应的指定拍摄对象相 关联的操作功能。
7. 根据权利要求 6 所述的数据处理系统, 其特征在于, 所述第一终 端, 还包括: 信息接收单元, 用于接收至少一个所述拍摄对象实体发送的对应于其 自身的识别信息, 以用于编码至所述视频数据中。
8. 根据权利要求 6所述的数据处理系统, 其特征在于, 还包括: 至少一个所述第一终端作为上层节点, 所有的所述拍摄对象实体作为 下层节点, 以形成 Ad Hoc分层式网络结构。
9. 根据权利要求 6 至 8 中任一项所述的数据处理系统, 其特征在 于,
所述第一终端还接收所述至少一个所述拍摄对象实体发送的对应于其 自身的可控信息;
其中, 所述编码单元还用于将所述可控信息与所述识别信息关联地编 码至所述视频数据, 且所述数据分离单元还用于从所述视频数据中获取与 至少一个所述拍摄对象相关联的可控信息, 所述处理单元还用于在检测到 对所述指定操作区域的所述操作动作的情况下, 根据所述可控信息执行对 所述指定拍摄对象的操作功能;
或所述第二终端还在检测到对所述指定操作区域的操作动作时, 将检 测结果上报至所述第一终端, 且所述第一终端相应地将对应于所述指定操 作区域的可控信息发送至所述第二终端, 以由所述处理单元根据所述可控 信息执行对所述指定拍摄对象的操作功能。
10. 根据权利要求 9所述的数据处理系统, 其特征在于, 所述数据分 离单元分离出的所述可控信息包括: 菜单数据、 链接信息、 控制命令; 以 及
所述处理单元执行的所述操作功能相应地包括:
根据所述菜单数据生成并展示对应的交互菜单、 打开所述链接信息、 执行所述控制命令。
PCT/CN2013/077929 2013-06-25 2013-06-25 数据处理方法和数据处理系统 WO2014205658A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201380069016.1A CN104885113A (zh) 2013-06-25 2013-06-25 数据处理方法和数据处理系统
EP13888374.9A EP3016052A4 (en) 2013-06-25 2013-06-25 Data processing method and data processing system
PCT/CN2013/077929 WO2014205658A1 (zh) 2013-06-25 2013-06-25 数据处理方法和数据处理系统
US14/888,004 US10255243B2 (en) 2013-06-25 2013-06-25 Data processing method and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/077929 WO2014205658A1 (zh) 2013-06-25 2013-06-25 数据处理方法和数据处理系统

Publications (1)

Publication Number Publication Date
WO2014205658A1 true WO2014205658A1 (zh) 2014-12-31

Family

ID=52140778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/077929 WO2014205658A1 (zh) 2013-06-25 2013-06-25 数据处理方法和数据处理系统

Country Status (4)

Country Link
US (1) US10255243B2 (zh)
EP (1) EP3016052A4 (zh)
CN (1) CN104885113A (zh)
WO (1) WO2014205658A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781348A (zh) * 2019-10-25 2020-02-11 北京威晟艾德尔科技有限公司 一种视频文件分析方法
CN111292136A (zh) * 2020-03-09 2020-06-16 李维 移动广告筛选投放方法及系统
CN111541938A (zh) * 2020-04-30 2020-08-14 维沃移动通信有限公司 视频生成方法、装置及电子设备
CN115297323A (zh) * 2022-08-16 2022-11-04 广东省信息网络有限公司 一种rpa流程自动化方法和系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106375631A (zh) * 2016-09-13 2017-02-01 上海斐讯数据通信技术有限公司 一种智能终端间的信息传输方法和系统
CN110557684B (zh) * 2018-06-01 2022-09-06 北京京东尚科信息技术有限公司 信息处理方法、系统、电子设备和计算机可读介质
CN112289100A (zh) * 2019-07-25 2021-01-29 上海力本规划建筑设计有限公司 一种智慧教室系统
CN112099723B (zh) * 2020-09-23 2022-08-16 努比亚技术有限公司 一种关联操控方法、设备及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035279A (zh) * 2007-05-08 2007-09-12 孟智平 一种在视频资源中使用信息集的方法
CN101945264A (zh) * 2007-05-08 2011-01-12 孟智平 一种在视频资源中使用信息集的方法
CN102592233A (zh) * 2011-12-20 2012-07-18 姚武杰 一种导游、导览、导购的方法和平台
US20130031176A1 (en) * 2011-07-27 2013-01-31 Hearsay Labs, Inc. Identification of rogue social media assets

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6172677B1 (en) * 1996-10-07 2001-01-09 Compaq Computer Corporation Integrated content guide for interactive selection of content and services on personal computer systems with multiple sources and multiple media presentation
CN100571743C (zh) 2007-04-16 2009-12-23 北京艺信堂医药研究所 一种治疗阳痿的中药制剂
WO2009111699A2 (en) * 2008-03-06 2009-09-11 Armin Moehrle Automated process for segmenting and classifying video objects and auctioning rights to interactive video objects
WO2009113976A1 (en) * 2008-03-11 2009-09-17 Thomson Licensing Joint association, routing and rate allocation in wireless multi-hop mesh networks
US9111287B2 (en) * 2009-09-30 2015-08-18 Microsoft Technology Licensing, Llc Video content-aware advertisement placement
US8493353B2 (en) * 2011-04-13 2013-07-23 Longsand Limited Methods and systems for generating and joining shared experience
US8718369B1 (en) * 2011-09-20 2014-05-06 A9.Com, Inc. Techniques for shape-based search of content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035279A (zh) * 2007-05-08 2007-09-12 孟智平 一种在视频资源中使用信息集的方法
CN101945264A (zh) * 2007-05-08 2011-01-12 孟智平 一种在视频资源中使用信息集的方法
US20130031176A1 (en) * 2011-07-27 2013-01-31 Hearsay Labs, Inc. Identification of rogue social media assets
CN102592233A (zh) * 2011-12-20 2012-07-18 姚武杰 一种导游、导览、导购的方法和平台

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3016052A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781348A (zh) * 2019-10-25 2020-02-11 北京威晟艾德尔科技有限公司 一种视频文件分析方法
CN111292136A (zh) * 2020-03-09 2020-06-16 李维 移动广告筛选投放方法及系统
CN111541938A (zh) * 2020-04-30 2020-08-14 维沃移动通信有限公司 视频生成方法、装置及电子设备
CN115297323A (zh) * 2022-08-16 2022-11-04 广东省信息网络有限公司 一种rpa流程自动化方法和系统
CN115297323B (zh) * 2022-08-16 2023-07-28 广东省信息网络有限公司 一种rpa流程自动化方法和系统

Also Published As

Publication number Publication date
EP3016052A4 (en) 2017-01-04
US20160078056A1 (en) 2016-03-17
US10255243B2 (en) 2019-04-09
EP3016052A1 (en) 2016-05-04
CN104885113A (zh) 2015-09-02

Similar Documents

Publication Publication Date Title
WO2014205658A1 (zh) 数据处理方法和数据处理系统
KR101680714B1 (ko) 실시간 동영상 제공 방법, 장치, 서버, 단말기기, 프로그램 및 기록매체
US10123066B2 (en) Media playback method, apparatus, and system
US9204197B2 (en) Electronic device and method for providing contents recommendation service
KR101314865B1 (ko) 모바일 환경에서 tv 화면과 연동하는 증강 현실을 제공하는 방법, 이를 위한 부가 서비스 서버 및 방송 시스템
CN110647303B (zh) 一种多媒体播放方法、装置、存储介质及电子设备
CN111897507B (zh) 投屏方法、装置、第二终端和存储介质
US9749710B2 (en) Video analysis system
CN109040960A (zh) 一种实现位置服务的方法和装置
WO2020044099A1 (zh) 一种基于对象识别的业务处理方法和装置
CN103412877A (zh) 图片传递方法及装置
WO2013135132A1 (zh) 移动增强现实搜索方法、客户端、服务器及搜索系统
WO2021179804A1 (zh) 图像处理方法、图像处理装置、存储介质与电子设备
JP2018128955A (ja) スクリーンショット画像解析装置、スクリーンショット画像解析方法、およびプログラム
CN113094523A (zh) 资源信息的获取方法、装置、电子设备和存储介质
CN112148245A (zh) 一种监控调看投屏方法、装置、计算机设备、可读存储介质及监控调看投屏交互系统
KR101923441B1 (ko) 전자 장치 및 컨텐츠 추천 서비스 제공 방법
CN106911948B (zh) 一种显示控制方法、装置、控制设备及电子设备
KR101584304B1 (ko) 콘텐츠 요청 장치 및 방법
CN104980807A (zh) 一种用于多媒体互动的方法及终端
US20120169638A1 (en) Device and method for transmitting data in portable terminal
CN106445117A (zh) 终端操控方法及装置
CN103853790A (zh) 移动终端浏览器的上传信息处理方法及装置
CN102222096A (zh) 一种用于提供多媒体资源访问服务的方法与设备
CN106254953B (zh) 一种图片显示方法及装置、图片接收终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13888374

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013888374

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 14888004

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE