CN100596186C

CN100596186C - An interactive digital multimedia making method based on video and audio

Info

Publication number: CN100596186C
Application number: CN200610081465A
Authority: CN
Inventors: 侯启槟; 王阳生; 曾祥永; 鲁鹏
Original assignee: BEIJING INTERJOY TECHNOLOGY Ltd
Current assignee: Beijing Interjoy Technology Limited
Priority date: 2006-05-22
Filing date: 2006-05-22
Publication date: 2010-03-24
Anticipated expiration: 2026-05-22
Also published as: CN101079996A

Abstract

This invention discloses a process method for interacted digital multimedia based on audio and video including: 1, getting site video images timely and pre-processing them to get initial video information, 2, processing the initial video information to video control information, 3, getting audio data in site timely and pre-processing them to get initial audio information, 4, converting the initialaudio information to audio control information, in which, steps 1 and 2 form a step set 1, steps 3 and 4 to a set 2, and the sets 1 and 2 are independent to each other and enter into step 5 after finishing the steps, 5, combining the video and audio information, changing body content and outputting then finishing.

Description

A kind of interactive digital multimedia making method based on video and audio frequency

Technical field

The present invention relates to a kind of computer man-machine interacting technical method, particularly a kind of interactive digital multimedia making method based on video and audio frequency.

Background technology

In recent years, along with the innovation of information technology, the extensive utilization of multimedia technology, the develop rapidly of communication medium industry, the intention and the form of all kinds of media releasing (as advertisement) also emerge in an endless stream, and be rich and varied.But traditional media releasing intention and form are in a single day fixing, all have the drawback of consistency, one-way, repeatability.Though development along with computer vision and speech recognition technology, use vision and voice technology and carry out the man-machine possibility that become simply alternately of nature, but how to make audient and all kinds of media releasing carry out contactless interaction, make media releasing incorporate the motion and the acoustic information of scenery around audient self and the place as far as possible, and can make the media releasing content produce different variations by this different interaction, the interactivity and the interest of audient and issue when improving media releasing become a challenge that faces when making relevant multimedia file.

Summary of the invention

The technical problem to be solved in the present invention provides a kind of interactive digital multimedia making method based on video and audio frequency, makes multimedia file by man-machine interaction.

For solving the problems of the technologies described above, the present invention includes following steps: beginning; Step 1, obtain the live video image in real time and carry out preliminary treatment, obtain preliminary video information by digit optical equipment; Step 2, step 1 is obtained preliminary video information process be converted into the video control information; Step 3, obtain the live audio data in real time, and carry out preliminary treatment, obtain preliminary audio-frequency information by digital audio-frequency apparatus; Step 4, step 3 is obtained preliminary audio-frequency information handle and be converted into the audio frequency control information; Wherein step 1 and step 2 are formed the step group of carrying out in order, step 3 and step 4 are formed the step group of carrying out in order two, step group one, step group two are independently of one another, can carry out simultaneously, can not carry out simultaneously yet, and no matter whether step group one, step group two are carried out simultaneously, all enter step 5 after executing; Video control information and audio frequency control information are merged in step 5, processing, and export the control command to body, drive body by control interface by described control command, and change body content is also exported, and wherein body refers to multimedia file; Finish.

The present invention is owing to adopt the interactive control of video and audio frequency, and the result is converted into the control command of multimedia file, and realization is to the direct control of virtual element in the multimedia file.

Description of drawings

Fig. 1 is that the inventive method is applied to the flow chart in the ad production;

Fig. 2 merges output to control information among Fig. 1, promptly the analysis of video and audio frequency and recognition result is mapped to flow chart in the correspondent advertisement control;

Embodiment

The present invention is further detailed explanation below in conjunction with drawings and the specific embodiments.

The inventive method can be divided on the principle based on the interactive digital multimedia making method of audio frequency with based on the interactive digital multimedia making method of video.

Wherein the interactive digital multimedia making method based on video comprises the steps:

1. by camera equipment, capture video images and carry out light and proofread and correct, remove preliminary treatment such as make an uproar in real time;

2. utilize the variation and the feature of video image on time and space that it is cut apart, image after will cutting apart carries out feature extraction, analysis, tries to achieve global motion information and local human body attitude information (position, direction, amplitude and the basic configuration parameter that they form) in the image; Handle by regularization,, be converted into the control command of advertisement these information;

3. control interface drives advertisement according to control command.

Interactive digital multimedia making method based on audio frequency comprises the steps:

1. will be from microphone and sound card equipment, gather voice data in real time and remove preliminary treatment such as make an uproar;

2. adopt tone analysis and speech recognition technology that the audio frequency of gathering is handled, obtain frequency values, range value and the corresponding meaning of one's words vocabulary recognition result of sound, be converted into the control command of advertisement;

3. control interface drives advertisement according to control command.

Must emphasize that above-mentioned two kinds of methods can independently use, also can be used in combination.

The embodiment that applies to ad production with this method further sets forth the present invention below.Fig. 1 is the flow chart of this embodiment, and wherein step (1-5) and step (6-10) can be used separately, and application also can walk abreast.

As shown in Figure 1, these embodiment concrete steps are as follows:

(1) obtains video image: obtain realtime graphic from the camera that is connected to computer by the high speed image trapping module.Owing to will handle each two field picture, so will from video flowing, image be extracted frame by frame.According to the difference of application purpose, the angle of camera can be people and the scenery in the place, also can take people and scenery from the top, place;

(2) remove to make an uproar and wait preliminary treatment: be to improve precision and the speed that subsequent motion information extraction and attitude information extract, need remove preliminary treatment such as make an uproar the two field picture of step (1) acquisition.At first, improve computational speed, the color image resolution of being gathered is reduced into original 1/4, and be converted to the gray level image of 256 grades in order to reduce operand.Secondly, ask space (in the frame) and time (interframe) to go up corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes.In addition, brightness is compensated to eliminate the influence of illumination variation.The pixel value that is about to each picture element deducts the average of entire image pixel value, divided by the variance of entire image pixel value, multiply by a coefficient more then.Through above-mentioned processing, thereby eliminate the influence that light changes to a certain extent;

(3) extraction of motion information:, need from the image after step (2) is handled, extract the movable information of the overall situation for the subsequent extracted attitude information.At first, present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then, the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image (0 expression less than, 1 expression more than or equal to) of moving region; At last, the bianry image to aforementioned acquisition carries out edge extracting, the edge of acquisition moving region.In addition, for certain fixed area, can try to achieve amplitude, direction and the speed parameter of this regional movement according to 1 shared ratio, position of centre of gravity and historical information in this zone;

(4) attitude information extracts: according to step (3) extraction of motion information result, further sport foreground is split, respectively zones of different is carried out feature separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment;

(5) the video Control Parameter is extracted and transformed: global motion information and local human body attitude information that step (3) and (4) are extracted change into corresponding control information;

(6) obtain voice data: gather realaudio data by microphone, sound card;

(7) remove preliminary treatment such as make an uproar:, remove by smoothing processing and to make an uproar for the audio frequency of real-time collection;

(8) tone information extracts: the audio frequency for removing after making an uproar, carry out tone analysis, and extract frequency values, the range value of sound;

(8) limited vocabulary speech recognition: adopt unspecified person, continuous speech recognition method, discern some discrete and less demanding limited vocabulary order of real-time,, stop etc. as beginning;

(9) the audio frequency Control Parameter is extracted and is transformed: the tone information and the limited vocabulary recognition result that are extracted are changed into corresponding control information;

(10) order realizes: the result that will discern at last, shine upon conversion by the command set that pre-defines, and obtain the control information of advertisement;

(11) multichannel merges: with the control information combination of video and audio frequency, form efficient comprehensively advertisement control command.

Describe above-mentioned steps 11 below in detail, soon the analysis of video and audio frequency and recognition result are mapped to the process in the correspondent advertisement control, and as shown in Figure 2, basic step is as follows:

(1) at first the ad content control command is classified: according to video have fast, intuitively, continuously output but be subject to characteristics of interference, and sound has the quick but not high characteristics of identifying instantaneity of nature required command set is effectively classified.

(2) based on the control of video: the corresponding relation that at first needs to set various movable informations and various human body attitude and advertisement controlled quentity controlled variable, then by scenery and audient colony around the camera collection place, attitude for motion in the image and human body, carry out real-time analysis and identification, according to current state, adopt certain predicting tracing algorithm, output control corresponding amount;

(3) based on the control of audio frequency, at first need to set up crucial dictionary, and the mapping table of keyword and related command, gather the voice signal of scenery around audient colony and the place then by microphone, according to tone analysis and voice identification result, produce control commands corresponding;

(4) by the advertisement control interface, with the order of video and audio frequency, real-time integration is in the virtual element and content control of advertisement, and perhaps direct adjustment model reaches the purpose of control.

In sum, the inventive method adopts the interactive control of video and audio frequency, exactly the motion and the sound of scenery around audient and the place are analyzed in computer and discerned, and the result is converted into control command to multimedia file, realize direct control virtual element in the multimedia.

Claims

1, a kind of interactive digital multimedia making method based on video and audio frequency is characterized in that, comprises the steps: beginning; Step 1, obtain the live video image in real time and carry out preliminary treatment, obtain preliminary video information by digit optical equipment; Step 2, step 1 is obtained preliminary video information, by video image is cut apart by its variation and feature on time and space, again the image after cutting apart is extracted and cut apart feature, extract global motion information and local human body attitude information, be converted into the video control information; Step 3, obtain the live audio data in real time, and carry out preliminary treatment, obtain preliminary audio-frequency information by digital audio-frequency apparatus; Step 4, step 3 is obtained preliminary audio-frequency information,, carry out the limited vocabulary speech recognition, be converted into the audio frequency control information by extracting frequency values, the range value of sound; Wherein step 1 and step 2 are formed the step group of carrying out in order, step 3 and step 4 are formed the step group of carrying out in order two, described step group one, step group two are independently of one another, can carry out simultaneously, can not carry out simultaneously yet, and no matter whether step group one, step group two are carried out simultaneously, all enter step 5 after executing; Described video control information and audio frequency control information are merged in step 5, processing, and export the control command to body, drive body by control interface by described control command, and change body content is also exported, and wherein said body refers to multimedia file; Finish.

2, the interactive digital multimedia making method based on video and audio frequency according to claim 1 is characterized in that, described multimedia file is the multimedia file that is used for image display or advertisement; Described digit optical equipment is Digital Video; Described digital audio-frequency apparatus is microphone and sound card.

3, the interactive digital multimedia making method based on video and audio frequency according to claim 2 is characterized in that, preliminary treatment described in the step 1 comprises that described live video image is carried out light to be proofreaied and correct, remove and make an uproar; Local human body attitude information described in the step 2 comprises the basic configuration parameter that position of human body, direction, amplitude and human body are formed; Preliminary treatment described in the step 3 comprises adopts tone analysis and speech recognition technology to handle the live audio data; Processing described video control information of fusion and audio frequency control information relate in the described step 5: the command set pretreatment module, video control transformation module and audio frequency control transformation module, wherein the command set pretreatment module is classified to the video/audio command set, and press described video/audio control information and the audio frequency control information accepted and give video control transformation module and audio frequency control transformation module with corresponding command mapping respectively, video control transformation module accept the order of described video control information and aforementioned command set pretreatment module mapping and output to the video control command of body to control interface, audio frequency control transformation module accept the order of described audio frequency control information and the mapping of aforementioned command set pretreatment module and output to the audio frequency control command of body to control interface.

4, the interactive digital multimedia making method based on video and audio frequency according to claim 3, it is characterized in that, described live video image removes makes an uproar, and comprising: at first the live video image resolution ratio is reduced into originally 1/4, and is converted to the gray level image of 256 grades; Ask in the frame then and interframe on corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes; Described live video image carries out light and proofreaies and correct and to refer to: the pixel value of each picture element is deducted the average of entire image pixel value, divided by the variance of entire image pixel value, and then multiply by a coefficient; Described extraction global motion information comprises: at first present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image of moving region, with 0 represent less than, 1 represent more than or equal to; Bianry image to aforementioned acquisition carries out edge extracting at last, obtains the edge of moving region; The local human body attitude information of described extraction refers to: according to aforementioned extraction global motion information result, further cut apart sport foreground, zones of different is carried out signature analysis separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment.