Embodiment
The present invention is further detailed explanation below in conjunction with drawings and the specific embodiments.
The inventive method can be divided on the principle based on the interactive digital multimedia making method of audio frequency with based on the interactive digital multimedia making method of video.
Wherein the interactive digital multimedia making method based on video comprises the steps:
1. by camera equipment, capture video images and carry out light and proofread and correct, remove preliminary treatment such as make an uproar in real time;
2. utilize the variation and the feature of video image on time and space that it is cut apart, image after will cutting apart carries out feature extraction, analysis, tries to achieve global motion information and local human body attitude information (position, direction, amplitude and the basic configuration parameter that they form) in the image; Handle by regularization,, be converted into the control command of advertisement these information;
3. control interface drives advertisement according to control command.
Interactive digital multimedia making method based on audio frequency comprises the steps:
1. will be from microphone and sound card equipment, gather voice data in real time and remove preliminary treatment such as make an uproar;
2. adopt tone analysis and speech recognition technology that the audio frequency of gathering is handled, obtain frequency values, range value and the corresponding meaning of one's words vocabulary recognition result of sound, be converted into the control command of advertisement;
3. control interface drives advertisement according to control command.
Must emphasize that above-mentioned two kinds of methods can independently use, also can be used in combination.
The embodiment that applies to ad production with this method further sets forth the present invention below.Fig. 1 is the flow chart of this embodiment, and wherein step (1-5) and step (6-10) can be used separately, and application also can walk abreast.
As shown in Figure 1, these embodiment concrete steps are as follows:
(1) obtains video image: obtain realtime graphic from the camera that is connected to computer by the high speed image trapping module.Owing to will handle each two field picture, so will from video flowing, image be extracted frame by frame.According to the difference of application purpose, the angle of camera can be people and the scenery in the place, also can take people and scenery from the top, place;
(2) remove to make an uproar and wait preliminary treatment: be to improve precision and the speed that subsequent motion information extraction and attitude information extract, need remove preliminary treatment such as make an uproar the two field picture of step (1) acquisition.At first, improve computational speed, the color image resolution of being gathered is reduced into original 1/4, and be converted to the gray level image of 256 grades in order to reduce operand.Secondly, ask space (in the frame) and time (interframe) to go up corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes.In addition, brightness is compensated to eliminate the influence of illumination variation.The pixel value that is about to each picture element deducts the average of entire image pixel value, divided by the variance of entire image pixel value, multiply by a coefficient more then.Through above-mentioned processing, thereby eliminate the influence that light changes to a certain extent;
(3) extraction of motion information:, need from the image after step (2) is handled, extract the movable information of the overall situation for the subsequent extracted attitude information.At first, present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then, the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image (0 expression less than, 1 expression more than or equal to) of moving region; At last, the bianry image to aforementioned acquisition carries out edge extracting, the edge of acquisition moving region.In addition, for certain fixed area, can try to achieve amplitude, direction and the speed parameter of this regional movement according to 1 shared ratio, position of centre of gravity and historical information in this zone;
(4) attitude information extracts: according to step (3) extraction of motion information result, further sport foreground is split, respectively zones of different is carried out feature separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment;
(5) the video Control Parameter is extracted and transformed: global motion information and local human body attitude information that step (3) and (4) are extracted change into corresponding control information;
(6) obtain voice data: gather realaudio data by microphone, sound card;
(7) remove preliminary treatment such as make an uproar:, remove by smoothing processing and to make an uproar for the audio frequency of real-time collection;
(8) tone information extracts: the audio frequency for removing after making an uproar, carry out tone analysis, and extract frequency values, the range value of sound;
(8) limited vocabulary speech recognition: adopt unspecified person, continuous speech recognition method, discern some discrete and less demanding limited vocabulary order of real-time,, stop etc. as beginning;
(9) the audio frequency Control Parameter is extracted and is transformed: the tone information and the limited vocabulary recognition result that are extracted are changed into corresponding control information;
(10) order realizes: the result that will discern at last, shine upon conversion by the command set that pre-defines, and obtain the control information of advertisement;
(11) multichannel merges: with the control information combination of video and audio frequency, form efficient comprehensively advertisement control command.
Describe above-mentioned steps 11 below in detail, soon the analysis of video and audio frequency and recognition result are mapped to the process in the correspondent advertisement control, and as shown in Figure 2, basic step is as follows:
(1) at first the ad content control command is classified: according to video have fast, intuitively, continuously output but be subject to characteristics of interference, and sound has the quick but not high characteristics of identifying instantaneity of nature required command set is effectively classified.
(2) based on the control of video: the corresponding relation that at first needs to set various movable informations and various human body attitude and advertisement controlled quentity controlled variable, then by scenery and audient colony around the camera collection place, attitude for motion in the image and human body, carry out real-time analysis and identification, according to current state, adopt certain predicting tracing algorithm, output control corresponding amount;
(3) based on the control of audio frequency, at first need to set up crucial dictionary, and the mapping table of keyword and related command, gather the voice signal of scenery around audient colony and the place then by microphone, according to tone analysis and voice identification result, produce control commands corresponding;
(4) by the advertisement control interface, with the order of video and audio frequency, real-time integration is in the virtual element and content control of advertisement, and perhaps direct adjustment model reaches the purpose of control.
In sum, the inventive method adopts the interactive control of video and audio frequency, exactly the motion and the sound of scenery around audient and the place are analyzed in computer and discerned, and the result is converted into control command to multimedia file, realize direct control virtual element in the multimedia.