One of the underlying assumption of these interfaces is that the more a virtual environment perceptually resembles the environment we are familiar with, the easier it will be for us to orient, navigate, and act in such an environment. But it has to be doubted seriously if technology will ever be able to create a synthetic sensory experience completely indistinguishable from the one we experience in our everyday world. Fortunately this is not a drawback of VR technology but its most interesting aspect, because it forces us to develop efficient interaction metaphors which refer to our cognitive skills but which do not necessarily attempt to mimic interaction as it happens in our everyday world. This is how new interaction techniques evolve and become candidates for developing into a new framework or language of expression. This phenomenon can be observed whenever new media are explored in a culture. As cinema or television had to develop their own expressive means adapted to the idiosyncrasies of their media, VR is currently developing its own language of expression which is still very rudimentary.
Cultural techniques of expression tend to mix and merge, reference each other, and are transformed and rethought in the context of new media. They form the rich tissue of a culture's means of expression most consequently explored and developed in its art. But we don't need to get into contemporary art theory to illustrate what is meant here. As an analogy, think of today's advertisement design which more and more often refers in adds to the desktop metaphor of current computer graphics user interfaces. The concept of a window, a menu-bar or pull-down menus suddenly can be used to present different aspects of a product. Such an add never would have been understood before a significant fraction of members of a society became acquainted with modern computer interfaces.
There are two aspects we wanted to clarify in this little excursion. Firstly, virtual environments cannot only be modeled after our everyday world but they have to develop their own interaction metaphors and means of expression. Secondly, the kind and quality of integration of different sensory channels into one simulation and display system determines to a large extent the basic vocabulary available for the development of an expressive framework which will eventually become part of a culture's communication skills.
Three wall size rear projection systems are installed orthogonal to the floor projection, each with a size of 3x3 meters. A dual pipe Onyx IR generates 8 user controlled images. The user position is tracked with Polhemus Fastrak sensors. Crystal Eyes shutter glasses are used for stereo image reception. The display resolution is 1024 x 68 pixels at 120 Hz for each of the 4 displays. While the first CyberStage installation in September 96 was fixed in a building, the new version has been developed for "mobile" usage. Both installations use a wooden skeleton to minimize noise for the electromagnetic tracking. An 8-channel-surround-sound system is fed by the IRCAM's room acoustic software Spatilisateur. The AVANGO software allows to import live video sources as well as prefabricated animations into virtual worlds. Virtual actors can be found on Stage in both ways, either in an on-line performance or in a pre-produced manner. Interaction within virtual environments is based on electromagnetic tracking using devices such as 3D pointers or 3D joysticks.
Virtual objects and control tolls are located on a real workbench. The objects, displayed as computer-generated stereoscopic images are projected onto the surface of a table. The computer screen is changed to a horizontal, enlarged work top version and replaces the two-dimensional flat screen. This view corresponds to the actual work situation in an architect's office, at surgery environments, on the workbench, for three-dimensional atlases, etc. The work action is virtual. A guide uses the virtual working environment while several observers can watch events through shutter glasses. The guide operates within a non-immersive virtual reality environment. Depending on the application, various input and output modules can be integrated, such as gesture and speech recognition systems which characterize the general trend away from the classical human-machine interface. Several guides can work together locally or use global communication networks such as broad-band ISDN.
Responsive Environments, consisting of tracking systems, cameras, projectors and microphones, replaces the traditional computer and is increasingly adapted to human needs. The control tools implement complex actions that can be easily achieved by intuitive movements of the users hand. Each control instrument is represented as a small virtual object that can be activated by grabbing it with the hand and moving it onto an object, which is to be manipulated. Rotations of objects then can be done just by turning the hand. The zoom operation is accomplished by simple up and down movements of a small virtual magnifier, which has been grabbed by the hand.
Several approaches have been published in applying virtual reality techniques for scientific visualization. Haase [Haase 97] has done a classification of those approaches reported so far, either using a virtual reality package and adapting visualization techniques to it, or using a visualization package and exporting geometry to a virtual reality presentation. Most work reported, like Bryson's Virtual Windtunnel [Bryson 92] is using head coupled display technology, but more and more experiments are reported in using projection systems.
The Responsive Workbench [Krüger 94] for example, developed 1993 by Wolfgang Krueger, has also been applied for scientific visualization. After 2 years of development together with the engineering research department of Daimler-Benz, it became obvious that the Responsive Workbench Virtual Environment together with a tailor made interactive visualization package is an ideal workspace for engineering applications, filling the gap between immersive and desktop environments.
In our system, a parallel computer (IBM-SP2) and a graphics computer (SGI Onyx RE2) were linked together via a HIPPI connection. The parallel computer has the task of preprocessing the results of the (already concluded) simulation computation and to create appropriate visualization primitives for the rendering on the graphics system.
Fig. 1: Fluid dynamics.
As a visualization platform, the Responsive Workbench offers powerful control and manipulation mechanisms to the engineer. He is free to choose from a wide variety of visualization methods and presentation modes. Positions can easily be located directly in the space of the simulation area. This flexibility helps evaluating results quickly and therefore reduces the time for the whole simulation and evaluation phase. The user is enabled to have deeper insight into the structure of flow fields in the simulation space, because he perceives the information in 3D, together with the geometry.
Fig. 2: Visualization of the airflow simulation in an aircraft engine.
Fig. 3: Molecular modeling.
Fig. 4: TELEPORT Display Room
The TELEPORT environment is designed to overcome disadvantages of desktop video-conferencing and to establish life-like conference sessions that bring people together as if face-to-face. TELEPORT has been developed at the Visualization and Media Systems Design Group of GMD, by S. Gibbs, C. Breiteneder, and C. Arapis [Breit 96]. TELEPORT mimics a shared physical context, using 3D modeling and rendering, and provides life-sized display of remote group members placed within a virtual space. The system is based around special rooms, called display rooms, where one wall is a "view port" into a virtual extension as shown in Figure 4. The geometry, surface characteristics, and lighting match the real room to which it is attached. When a teleconferencing connection is established, video imagery of the remote participant (or participants) is composited with the rendered view of the virtual extension (see Figure 5).
Fig. 5: Remote Participant in Virtual Meeting Area
The viewing position of the local participant is tracked, allowing imagery appearing on the wall display to be rendered from the participant's perspective. The combination of viewer tracking, a wall-sized display, and real-time rendering and compositing, give the illusion of the virtual extension being attached to the real room. The result is a natural and immersive teleconferencing environment where real and virtual environments are merged without the need for head-mounted displays or other encumbering devices. The current system uses a 3m x 2.25m rear-projected video wall attached to a 3m square room. The video-wall is driven by a pair of high luminosity video projectors. Both projectors can display mid-resolution video signals and high-resolution RGB signals. A camera is placed on a stand or a table and set at approximately eye height. The field of view is wide enough to take in a full upper body shot of the local participant. A viewer tracking system determines the position of the local participant within the display room, from which their viewpoint is derived. Two techniques are used for segmentation (for determining the regions of the video signal where a participant appears) chroma-keying and delta-keying. The virtual extension is rendered from the viewpoint of a tracked participant located in the display room. Because this person is free to move within the display room, the virtual extension must be continuously re-rendered. Currently an SGI RealityEngine2 is used to achieve rendering rates, with texturing and full anti-aliasing of up to 25 frames per second. The video imagery of remote participants is combined with the rendered virtual extension (compositing). For audio, each participant wears a small microphone. The audio signals from remote participants are mixed together and sent to speakers mounted on either side of the video wall.
In the C2 environment [Sym 97] which is similar to a CAVE or Cyberstage familiar tools are being examined in a new technology and the special features of this virtual reality environment are used to develop completely new tools for the visualization of high dimensional data. The applications of the work will extend to almost all areas of science. In particular spatially dependent data, for example, data collected over geographical domains for environmental assessment, and agricultural applications are being examined.
Fig. 6: This image shows the cube containing the data and one of the paint tools.
Several tools allow a user to interact with the environment. The features include a toolbox to select color, glyph type and size. Creating a custom brush is supported by the design box which enables a user to create a paintbrush for marking different data points. The rotation interface allows the user to examine the entire data set from different angles. All interaction employs audio feedback in addition to visual response. People involved at this stage were Uli Lechner, Dianne Cook , Carolina Cruz-Neira , Jürgen Symanzik at Iowa State University.
Fig. 7: Evolution of Simple Virtual Robots (Symbots) Using Genetic Algorithms
John Walker with Dan Ashlock, Allen Bierbaum, and James Oliver
Iowa State University, Ames Iowa USA
Funded by the National Science Foundation Young Investigator Award: Grant #DDM-9258114.
This project demonstrates how a genetic algorithm can be used to optimize problems of guidance and control for simple autonomous agents, which we call Symbots. The Symbots are controlled by simple neural networks with input parameters designed by a genetic algorithm. The Symbots learn to find food sources while avoiding collisions with each other.
The evolution is driven by a measure of relative fitness of a group of candidate designs. Fitness is the number of food sources hit without colliding with another Symbot. Once the fittest of the candidate designs are determined, the control system parameters (which can be thought of as a `gene') are cross combined and/or mutated to create new candidate designs.
Users interact in real-time with the resulting evolved Symbots by placing food sources, controlling evolution, and guiding a movable food source. The user may also navigate within the symbot world and capture the preformance of individual symbots for later analysis or rendering in an animation package.
This interactive experimentation facilitates a greater understanding of the Symbot's behaviors and gathering strategies than is possible with a traditional display system. The user gains immediate feedback about the Symbot's performance in the environment, allowing the design and testing of scenarios for training robots to perform tasks in hostile or hard-to-model environments.
Nodes refer to each other as parent and children. A node can have more than one parent, therefore the modeling hierarchy is a graph and not a tree. This is usefull for multiple references to the same object without copying all the data related to this object. To support multi user environments the objects and their state must be shared between the different sites over a network connection. To reduce the amount of data that has to be transferred, only state changes should be transmitted. To handle a virtual world as a state machine, is a big advantage for solving this distribution problem and can also be used to store a copy of the world on disk at any time.
If you think of a virtual world as a 4 dimensional system, the first 3 dimensions describe the euklidian space, which is coded into the modeling hierarchy by nested transformations and some leaf nodes, which define the representation of a part of the virtual world for all available displays. Especially for the visual part a lot of modeling software and description formats are available. Even if this limit seldom is reached, it should be mentioned that the resolution of this euklidian space is limited by the floating point precision of the used hardware. For example it is difficult to represent the relative size of a single atom and the whole galaxy in the same continuous space.
The fourth dimension manipulates the state of the virtual world over time. This leads us to the necessity of some active parts, which control the behavior of each object. For the above mentioned bird, this can be his sole movement or his paticipation in a swarm. To keep things simple, every node in the modeling hierarchy can change his state and the distribution of information about state changes is done through a data flow network, which is orthogonal to the modeling hierarchy. This enables us to encapsulate complex behaviours in single objects or even chains of objects and postulate the results to other objects, defining the visual or auditory features for example. Before addressing the details of the implementation, the limited temporal resolution and absence of temporal continuity of the rendering processes should be mentioned here. A different update rate must be guaranteed for each of the human sensory channels addressed by the already mentioned displays. The visual display for example should have a video rate of more then 50 Hz for each eye to prevent the user of growing weary. The rendering frame rate, which is independent from the video rate, should reach a value of something more then 20 Hz, to give the illusion of continuity or realtime. If there are different realtime systems involved in the same virtual world, which all have their own definition of what the resolution of realtime should be, the software design must support this asynchronous needs.
"This is supposed to be the mission statement for the project. The system has to integrate a variety of different interface devices currently in use at VMSD. Most notably these are the Responsive Workbench, the CAVE, the Communication Wall and the Virtual Studio. It has to be sufficiently general purpose to support application development for all these devices. As new devices are likely to be invented by the more creative minds at VMSD, it has to be easily extensible and adaptable. Most of those interfaces come with a mix of more or less exotic input devices. The system has to be highly interactive and responsive. VMSD is a demo pit. The system must support a rapid prototyping style of application development, without the need to adhere to a tedious write-compile-execute-kill cycle. It has to support the development of truly distributed applications. The System is targeted at high-end SGI workstations not less powerful than the Onyx RealityEngine and has to deliver every jota of performance these machines are capable to deliver."
Since we had been very hungry when looking for a name of the system just described, we descided to call it: AVANGO. A rationale for this name was quicly found, since the seed, the pulp, and the peel of the avango fruit perfectly represent the design of our system. There is the core library, which is represented by the seed and which implements the basic functionality of our VR toolkit. The pulp stands for a rapidly developing set of lower and higher level tools derived from the core library, used to configure applications, which build the peel of the system. To achieve the goals defined in the mission statement, we support the following general concepts:
All kinds of configurations of input and output devices can be assembled to so-called browsers. The browser builds up the interface between the user and the virtual world. Typical elements of a browser are the visual, auditory and tactile displays as output devices and spacial trackers, audio or video sources as input devices. In a multiuser environment every user configures his or her own browser.
All relevant parts of the system's API are mapped to an interpreted scripting language (Scheme). This enables us to specify and change scene content, browser features and object behavior in a running system. This eliminates the rather disagreeable write-compile-execute-kill cycle of the application development process.
All objects know how to read and write their state from and to a stream.
Together with streaming support for objects this enables us to write the complete state of the system to a disk file at any time. An initial system state can be read from a disk file as well.
All objects can be distributed. Their state is shared by any number of participating browsers. Object creation, deletion and all changes at one site are immediately and transparently distributed to every participating browser.
The System is extendable by subclassing existing C++ system classes. This concerns object classes as well as classes which encapsulate browser features. Compiled extensions can be loaded into the system at runtime via DSOs.
Browsers provide input/output services which can be mapped to objects in the scene. Objects can respond to events generated from input devices or other objects and can deliver events to output devices.
Fig. 8. System Overview
Performer can be described from two points of view. First, there is the data processing organized in a pipeline and computed in parallel. This so called rendering pipeline consists of a set of optional units, for:
* a database connection
* a user application
* the visual culling of the scene
* the intersection of objects
* the drawing of the scene
The second view focuses on the data structures used to describe the visual virtual world. There are different types of nodes available which can be connected by parent/child relationships to form a directed acyclic graph. Because Performer only supports visual displays, the nodes contain the information useful to describe the visual portions of the virtual world. If we look at AVANGO from these points of view, the data processing of Performer is extended by a sound rendering and a tactile feedback which all can be configured through the scripting interface to meet the specific needs of different hardware installations. The data structures available in performer are extended, to meet the stated general concepts. These extensions to the data structures are only available in the application process and have to be Performer compatible, to be used in the processes involved in the pipeline afterwards. This compatibility is achieved by deriving all AVANGO objects, which have to be rendered visually, from Performer node classes.
Performer 2.1 has a method based object API with getters and setters for all the different fragments of object state information. This is translated into our field API by subclassing all Performer object classes once using special AdaptorFields to encapsulate method declarations. Further extensions are subclasses only from these adapted Performer classes. This approach ensures full Performer object functionality as a basis for extension development.
Fig. 9: List of Performer nodes available in AVANGO and their C++ inheritance structure:
Fields come in four different flavors. A SingleField holds a single, arbitrary typed value. A MultiField holds any number of values of the same type. To adapt the Performer method based object API to our field based API a SingleAdoptorField and a MultiAdaptorField are used. All fields are derived from a single base class, and have methods to set and get field values.
Table of available field types: fpSFInt fpSFUInt fpSFBox fpSFNode fpSFLong fpSFGroup fpSFULong fpSFGeoSet fpSFFloat fpSFGeoState fpSFDouble fpSFHighlight fpSFBool fpSFMaterial fpSFString fpSFScreen fpSFVec3 fpSFWindow fpSFVec2 fpSFBlock fpSFVec4 fpSFIntBlock fpSFMatrix fpSFUshortBlock fpSFQuat fpSFSeg fpSFVec2Block fpSFPlane fpSFVec3Block fpSFSphere fpSFVec4BlockFields can be connected to each other, i.e. field A that is connected from another field B will receive B's value whenever field B is changed. This allows for a dataflow network to be constructed orthogonally to the object hierarchy. This dataflow network is evaluated for each simulation frame. Loops are detected and handled properly.
1. Nodes are the already mentioned classes adapted to the performer node classes for the description of the modeling hierarchy. They are field containers and their state is plainly described by their individual set of fields. In a distributed multi user environment only nodes must be shared with other users. While Performer only knows about static visual focused objects, in AVANGO also audible and tactile properties can be defined.
Table of basic node types:
fpGroup fpDCS fpIntBlock fpMaterial fpGeoSet fpGeode fpUshortBlock fpHighlight fpSwitch fpFloatBlock fpGeoState fpVec2Block fpVec3Block fpVec4Block
Fig. 10: Inheritance graph for sensors.
Table of basic sensor types: fpSensor fpViewActuator fpTimeSensor fpScreen fp6dofSensor fpWindow fpXvsSensor fpStereoScreen fpDeviceSensor fpStereoWindow fpADSensor2. Sensors are field containers, but not inherited from Performer objects. They are used for data import and export from the AVANGO system to the outer world. Sensors are not "visible", "audible" or "tangible" on the different displays and therefore don't have to be part of the modeling hierarchy. They implement the local features of an AVANGO application and therefore must not be distributed. An example for a typical sensor object are the windows on a workstation screen used as visual display or a graphical user interface used to control some global parameters.
3. Services are neither field containers nor inherited from Performer classes, but they provide a functional Api to unique system features. They are even more local then sensors and therfore must not be distributed either. A service would implement the access to an external device like 6 dof-trackers for example, which exists only a limmited time and therfore should be used from only one location inside the application. A sensor may use this service to access a tracker and to maintain the data related to it to the modeling hierarchy via its fields.
Fig. 11: AVANGO Application layout. Field connections are orthogonal to the modeling hierarchy and distribute data generated in nodes or imported by sensors. The browser is configured by services and sensors, only the modeling hierarchy is distributed.
* Auditory Rendering
Rendering the auditory scene has to take into account the position of the observer's head in the virtual world and in the auditory display as well as the characteristics of the auditory display (i.e. the loudspeaker configuration). The auditory rendering process is a two stage process. In the first stage a source signal is synthesized which is then spatialized in the second stage. In the first stage, only the sound model parameters are needed by the rendering process. In the second stage, the signals driving the auditory display are computed as a function of the distance between observer and sound source, the radiation characteristics of the source and the signature of the acoustic environment. With these signals the auditory display produces a sound field creating the illusion of a sound source emitted from a certain position in a certain acoustic environment shared by the observer and the source. The sound rendering process has to be a dynamic process, i.e. movements of the observer's position in the display or in the virtual world or movements of the sound source have to be taken into account. If these movements are faster than about 30 km/h, the pitch changes due to Doppler shift have to be simulated as well.
* Tactile Rendering
The CyberStage display system includes a set of low-frequency emitters built into its floor. This allows to generate vibrations which can be felt by users through feet and legs. There are two main areas of application of this display component. First, low frequency sound (which cannot be localized) can be emitted that way and thus complement the loudspeaker projection. Second, specially synthesized low-frequency signals can be used to convey attributes of objects displayed such as roughness or surface texture. From the point of view of rendering, the vibration display is handled like sound. Sound models are used to generate the low-frequency signals. Sound synthesis techniques generally referred to as granular synthesis are very well suited to produce series of band-limited impulses which may represent surface features. Such features can be displayed through user interaction. For instance, a virtual pointing device can be used to slide or glide over an object's surface which, depending on the gliding speed, provokes the corresponding impulses are produced. Additionally, higher-frequency sound can be produced if necessary. Some of what can be felt usually through the skin of our fingers when sliding over object is presented to our feet. Although the quality of touch cannot be reached with this approach, it can complement sound and vision dramatically.
* a distributed, parallel, numerical solver for the SP2,
* a visualization module running on the Onyx,
* and a coupling module, using IP over ATM.
We will focus on the visualization module where the data reduction of multiblock curvilinear grids to some visualisation primitives like points lines or surfaces for AVANGO is done and skip the simulation and connection details.
* A cell, which represents the volume covered by 8 corner values and the connections to the 6 neighboured cells. Each corner value consists of a location in space and some scalar values, which represent a physical dimension such as pressure or velocity.
Fig. 12: Cell
* A block, to represent the single elements of the multiblock structure. This blocks are the smallest logical units in the dataflow managment and contain a 3 dimensional grid of cells.
* A block container, which represents the total of one simulation step.
* A block connection, which represents the neighbourhood of blocks on cell level.
* A particel, which traces the appropriate cell for a given position in space with respect to some time and space dependent heuristics. A new position is always searched for by navigating through the grid of cells relative to the current position. To jump between neighbouring blocks, the just mentioned block connection is used.
Fig 13: Two blocks consisting of cells, connected by block connections and traced by a particle.
1. A polygonal surface, with a optional color coding of any scalar value available in the grid. The surface assembles a slice of the grid and every gridpoint, lying on the surface corresponds to a vertx in an indexed triangle strip set (fig 14) defining the surface. An average of 50.000 triangles per second computing time was benchmarked on a R10000 Mips processor.
Fig. 14: Visualisation of the surface of 3 segments of the simulated engine with a color coding of the density Rho. The surface consists of about 30000 triangles per segment.
Fig. 15: Structure of an indexed triangle strip set. Every vertex is defined and indexed only once. 6 vertices define 5 triangles with correct ordering (1. triangle [0 1 2], 2. triangle [1 3 2]).
2. Isolines of any of the available scalar values for a given threshhold. The lines are generated as nurbs curves and can be switched on and off for each block individualy. The calculation is done by a 2d adaption of the marching cubes algorithm with a benchmark of 400.000 cells per second.
Fig. 16: Visualization of the isolines for the velocity values by a given threshhold of mach 1 for all blocks. This would be to complex in a realtime environment where only single blocks can be viewed.
3. A swarm of particles to visualize velocity values in a local area. The particle sources can be manipulated by changing position and frequency directly in the virtual scene and are shown as points. The calculation of their movement is complicated by the distortions of the simulation grid and the transitions between the different blocks. The calculation is done permanently and about 2000 particles per second can be traced.
Fig. 17: Visualization of a swarm of particles with its interactive manipulator. The particle sources are bound to this manipulator and can be positioned freely in the entire simulation area.
4. A matrix of velocity vectors, whose position, size and density can be manipulated freely. Every vector in the matrix corresponds to a static particle and only a user related variation of the matrix leads to the tracing also used for the swarm.
Fig. 18: Visualization of a matrix of velocity vectors whose length and direction correspond to the velocity values at their location. They can be translated, rotated and scaled by the surrounding manipulator.
Fig.19: Overall structure of the modularisation
Figure 19 shows the overall structure of the described sensors and their connection to the modeling hierarchy. The calculated graphical data is propagated to specific nodes in the modeling hierarchy via fields. Because the normal copy mechanism of field connections would be to slow, only pointers are used. The modularisation allows an easy scalability for better calculation results by a simple duplication of one of the sensors. For example the amount of particles can be doubled by a second engine, if there are enough CPUs available. It's very importand to keep track of all the different processes involved. Every sensor and the import service needs its own CPU, if a smooth transition of the simulation data into the virtual world should be achieved and there are already some other processes running which build the performer rendering pipeline.
* Swarm node
* Video texture node
* Motif gui sensor
* Wave node
* Button node
* Tighten node
* Guard node
* Texture stack node
* Metronom node
* Explosion node
* Interpolator nodes
* Pendulum node
* Rotation node
* Rotation motor node
* Linear motor node
* 6 degree of freedom sensor
* Mouse screen
* Intersection service
* Pick node
* Dragger nodes
User Interaction and Navigation
Texture memory and main memory
Number of triangles that can be rendered in real-time
Number of sound sources that can be rendered in real-time
Virtual worlds tend to be very complex. The geometrical representation requires lots of the hardware resources. While developing an application you have to keep in mind that you need to keep a very high frame rate in order to guarantee a satisfying experience. Therefore the number of triangles that can be rendered at 20 frames per second or higher is strictly limited since you need to render eight images at a time if you run a CyberStage. In addition you cannot allow your application to start swapping main memory or texture memory which causes noticeable delays and disturbs your virtual experience. The same is true for the number of sound sources. The experience gets lost if you cannot render the sound fast enough to display the sound at the same time when it's geometrical representation is visually displayed. While developing custom and special effects you have to make sure that there is enough CPU time left for the main processes like culling, rendering and application management.
External hardware resources like the Sirius video board, which allows the import and export of video signals to the graphics subsystem of high end SGI machines, are also limited. To plug in a camera for some conferencing application or a performance means that there is no possibility for another video source to be connected. Because this limits are underestimated often, they have to be present while any application is designed.
How do we manage our limited resources now? We can separate the mechanisms into three groups:
A very important field of user interaction is the navigation in the virtual world. This means not only to control the movement in euklidean-, but also in information space. Although this information is always arranged in space and time, the navigation metapher should be completely different with respect to the specific topic of information. In the area of scientific visualization for example, the navigation possibilities have to meet the coding principals of the scientific data, to open any included secret spot, whereas applications in the area of training simulators do not explore the virtual world itself, but focus at the navigation or interaction process and its realness directly. If not for training purposes, the neccessity for navigation possibilities increases with the complexity of the virtual world. As in the real world, a guidance of the user through the information space is the basis for a successful navigation. This guidance can passively show the parallel layers of information available, or activaly push and pull the user, who's in this case more and more changing to a passive consumer, through a sequential story.
processes at the level of time measured in atom millionths of a second environment of the human seconds, minutes, years universe millions of yearsTable: Relationship between time and environment
Virtual worlds may also help to understand what happens at a different level. For example we can walk through a virtual molecule and "feel" the distances between the atoms.
Johnson [John95] defines a model as an encapsulation of the shape, shading, and state information of a character. Shape refers to the geometry of objects. This includes what geometric primitives they're composed of (spheres, polygons, patch meshes, etc.), as well as what geometric transformations are acting on them (scale, skew, translate, rotate, etc.). Shading refers to the subtle surface qualities of an object due to the material properties of the objects in the scene, local illumination effects, and global illumination effects. State refers to other more ineffable, but still perceivable qualities, like elements of personality. The properties of a model that can change over time are called articulated variables.
Generally speaking, each object in a dynamic scene interacts with any other object of this scene. According to the type of these relations, different motion rules and constraints are derived: geometric contact directs the motion of objects; kinematic links require calculation of a kinematic solution (forward and inverse kinematic); a magnetic relation leads to a force which depends on the distance between the two objects, etc. Physical and heuristic rules are used to simulate these systems. [Ast93b,Bar91].
The following list contains some categories of events, that change the layout of the environment. Some of them will consider objects, such as collisions or deformations and others describe changes of the layout at the level of the surface, for example deformations and disruptions [Str96,Gib79].
* Rigid translations and rotations of an object These are displacements, such as open a drawer or turns, for example open door. Furthermore combinations consist of both, translations and rotations, for example a rolling ball is such a movement.
* Collisions of an object, we can differ between collisions with or without rebound.
* Nonrigid deformations of an object. Objects can be classified as inanimate or as animate. An example for deformations of inanimate objects are the drops of fluid. An example for animate objects is the change of posture of a human.
* Surface deformations, examples for surface deformations are waves, or the flow of a liquid. Deformations can cause elastic or plastic changes of a surface. Note that a deformation will not cause a disruption of the surface or disturbing the continuity of the surface.
* Surface disruptions, rupture occurs when the continuity of a surface fails. A surface can be disrupted for example by rupturing, cracking, disintegration or an explosion.
Inanimate Objects are described in terms of the environment, e.g. by their surfaces. They are static, but they have of course a function. E.g. the ground can support walking or a hole affords to fill something in.
Animatable Objects have the Potential for Change. What happens, if you pull on this part, if you push here, or prod there? A model is an immutable thing; static and stiff. An animated model is one which is changing in some particular way over time. Each animated part can be accessed through an articulated variables. Looking at a model from the outside, the only thing that is visible are the variables (or the fields). Some of the variables are read only, some are both readable and writable. Some change over time, some don't. To manipulate the model is to write a value to one of the model's variables. In order to write to a variable, a process must first attach to a variable. Once they've successfully attached to a variable, a process can write new values to it. When they are done writing to the variable, they must detach from it.
1. Before you can start to build a character you have to write down it's role. The character is the particular instantiation of that role for a given performance situation. It is constrained by the role, which itself must work with in the bounds of the story. Available choices are always constrained by the role and its place in the story.
2. Now you should describe it's physical appearance, it's typical body postures and the contact points with physical objects. How does the character hold itself? Is it happy or unhappy? Does it walk like a soldier or like a frightened schoolboy? An important part of an actor is the face. This has not to be a human face with eyes, a nose and a mouth. Here a "face" refers more to the meaning of the character What does this character's face look like? Is it warm, denying, uncomfortable, obtrusively, affords it some kind of communication or is it only a blank undefined nothingness.
3. Now it is time to specify the activities that the character will be asked to perform. Again: Does it physically has everything it needs? What are it's everyday objects? Is it able to interact with the objects in it's environment and to perform the required tasks? At this point, we're still building up the character. If we suddenly realize that our character needs some particular property, we might look in the script to see if there's any information. A system has to have features to recognize such missing parts to change interactively parts of a character or to exchange an agent by another.
4. On which level have we to interact with the character? Zeltzer discusses a three part taxonomy of animation systems: guiding, animator level, and task level. Guiding includes motion recording, key-frame interpolation, and shape interpolation systems. Animator level systems allow algorithmic specification of motion. Task level animation systems must contain knowledge about the objects and environment being animated; the execution of the motor skills is organized by the animation system.
* the character must react
* the character must be seen as having an independent existence
* the character must have choices to make
* the character must adapt (e.g. to the situation, to the experience level of the user)
* the character must display variability in movement and response
With a few exeptions, the behavioral complexity of the creatures created to date has been limited. Typically the creatures pursue a single goal or display a single competence. For example work has been done in locomotion [Zel90b,Bad93], flocking [Rey87], grasping [Kog94], and lifting [Bad93] Tu and Terzopoolos' autonomous animated fish [Terz94] which incorporate a physically based motor-level, synthetic vision for sensing, and a behavior system which generates realistic fish behavior. But learning is not integrated in the actrion-selection mechanism , the fish address only one goal at a time, and the action-selection architecture is hard-wired, reflecting the specifics of the underlying motor system and the repertoire of behaviors they wished to demonstrate. Tosa [Tos93] used neural networks to model an artificial baby that reacts to the sounds made by an user. McKenna and Zeltzer [Zel90a, Zel90b]. demonstrated an articulated figure with 38 degrees of freedom, that uses the gait mechanism of a cockroach to drive a forward dynamic simulation of the creature moving over even and uneven terrain. It is an example of how successfully biologically-based control schemes can be adapted for computer animation. Sims designed a system for making creatures that, using inverse kinematics and simple dynamics, could navigate over uneven terrain [Sims87]. This system was notable in that the notion of "walking" was generalized enough that he could generate many different kinds of creatures that all exhibited different behavior very quickly. More recently, Sims has developed a system for quickly prototyping creatures embodying a set of physically-based behaviors by breeding them [Sims94]. He presents a genetic language that he uses to describe both the shape and the neural circuitry of the creatures. His work is most interesting in the context of building systems in which creatures are bred by using aesthetic decisions as fitness functions. This work, more than any other, shows the power of genetic techniques when applied to complex computer graphic character construction problems. Even more recently, Blumberg and Galyean [Blum95a] demonstrated a "directable" dog character that can respond autonomously, in real-time to user input in the context of a larger, scripted narrative activity. He is using etological theories for his work, but he is also looking at classical animation.
"Any part or process of the mind that by itself is simple enough to understand - even though the interactions among groups of such agents may produce phenomena that are much harder to understand."
Think about a world which consists of thousands of autonomous sources. Each source is able to produce a short colored flash followed by a beeping sound. If a source in it's neighbourhood will hear or see this event, it will also beep and flash, but in a little bit different color. A source will sample it's environment time by time. To make these autonomous sources to individuals we give each of them a different sample rate, a different beep and a different color. An observer who is walking around in this scenario will find sources which are building sequences. He will walk through centers of confusion, see harmony and disharmony in very close neighborhood, chaos and regularity and then there is only dark silence. It is hard to describe your impression, to figure out how it works, to see the structure behind these irregular structures.
Following Rasmussen [Ras83] we characterize knowledge about the figure and the world as follows:
"Signals represent sensor data-e.g., heat, pressure, light-that can be processed as continuous variables. Signs are facts and features of the environment or the organism."
In a virtual world signals are directly messurable in the environment. This can be an open door, the elbow position of a character or the curvature of a surface. Signals are discretely sampled using a mechanism called a receptor. Receptors are used by agents to discretely sample signals variable at a given frequency. Each receptor has a sampling frequency associated with it that can be modified by the agent. If their value changes by some epsilon from one sample to the next, they transmit a message containing the new value to the appropriate agents, which prompts the agent to recalculate itself. All receptors can be embedded into a single system process, but it's also possible to distribute them in several processes accessable over a network.
A sensor agent computes a boolean value using the information gathered by its receptors. It produces an assessment of the item it perceives (i.e. LightIsOn, ItIsToLoudHere, FriendIsNearby, iAmSitting, aCupIsNearby, etc.). Sensor agents corresponded directly to signs. They enable the character to perceive itself and its environment at a level above the articulated variables of a model. Sensor agents does not depend on available hardware or on the current environment. They have not to differ between real and virtual actors, between the real and the virtual world. So different characters and installations can use the same sensor agents, but the result may look completely different, because the receptors are handled in a different way. A receptor will be evaluated many times relative to the sensor agent. They represent an assumption that communication is expensive compared to computation; it is cheaper for a sensor agent to embed a computation somewhere else than to keep asking for the value and determining itself if the value has changed enough for it to recalculate itself. The sample frequency corresponds to the minimum reaction time of a character. It can also be used to model the attention of virtual actors. E.g. it can be changed, if the character is awake or falling in sleep or if it should pay less attention to events happening in greater distance.
* Skill Agents
The skill agent has some activity that it's trying to perform over some period of time, and it does this by measuring and manipulating the variables that it has access to. A skill can be executed, if it's preconditions are true. These preconditions are calculated by sensor agents. As the skill agent begins, it attaches to the variables in the character and the environment that it will be manipulating. When the skill agent finishes, it detaches from the variables in the character and the environment that it was manipulating. Many times the best way is to begin by constructing the skill agent is simply as a process which invokes single reflex, where the skill agent merely sets the articulated variables of the parts of the model it is concerned with. When building a skill agent, ideally, we want to always be thinking about how to write it for more general use.
* Goal Agents
Goal agents embody an explicit goal, and are described in terms of sensor agents. Different kinds of goals, characterized by their lifespan are:
- persistent, constant goal: e.g. stay alive
- transitory: e.g. hold the book, be hungry, get tired
* Motor Goal Parser
A motor goal parser will take care of translating a given task from a natural language interface to a set of task primitives which can then be passed to the skill network. The skill network will be responsible for finding out which skills need to be executed in what order and for invoking the appropriate skill agents corresponding to those skills. Complex activities like "making breakfast" are composed of component acts such as "make coffee" which are in turn composed of yet other behaviors - get the coffeepot, boil the water and so on. That means also a task has to be devided in a number of sub-tasks. For example "grasp a cup" is a sub-task in the example above. Such a sub-task can furthermore be devided in subtasks until you get task-primitives. Task primitives limit the decomposition of a task. Schanks Conceptual Depency (CD) theory [Scha75] introduce a set of primitive "ACTs", some of which describe physical actions and some of which describe abstract "mental actions". Zeltzer and Johnson develop in [Zel94] a set of task primitives, which are based on Schanks theory.
* Virtual Representants
A virtual actor can act instead of a real person in some domains, e.g. to arrange a meeting or to introduce somebody in a domain or to present a product. The face of the human could be captured and mapped onto the face of the virtual actor. This representant could be dressed like it's owner. Some gestures and postures of the human can be recorded in the cave, by the parrot or by cameras which are installed at the terminals. This data can be used to improve the movements of the representant. The virtual actor has then some characteristics in common with it's owner and all actors move in a different way.
* Social Interaction
Interactions between humans are multimodal. A composition of verbal and nonverbal communication improves the believability of a virtual actor. Such a virtual actor can be part of a "social interface". It can replace parts of common interfaces. Instead of pressing buttons, filling in forms it could be easier to interact instead with a virtual actor and tell him the task of the system. Since nonverbal communication is an important part of such an interface we will give a short introduction in this field.
* Nonverbal communication
Nonverbal communication is used e.g. to express interest, friendliness, anger, disgust, fear, happiness and sadness. This form of communication can serve as a fast feedback and reduces the complexity of verbal communication. It tells about the context of a conversation. Furthermore it shows how interested the communication partner is in a communication and if he is experienced in this domain or not. The communicating partner has to observe and to react to special postures or facial expressions.
* face-to-face communication
One of the major features of face-to-face communication is its multiplicity of communication channels that act on multiple modalities . 65 percent of a face to face communication is conveyed by nonverbal elements [Bech96].To realize a natural multimodal dialogue, it is necessary to study how humans perceive information and determine the information to which humans are sensitive. A face is an independent communication channel that conveys emotional and conversational signals, encoded as facial expressions integrates speech dialogue and facial animation. A virtual actor can be equipped with a number of facial patterns, belonging to conversational situations like "yes", "no", "I donot understand","thanks" . While the actor is speaking they will be displayed on it's face belonging on the actual context. Nagao and Takeuchi display a artificial face on a monitor and use it for a human computer dialog [Nag94]. Their experiments have shown that facial expressions are helpful, especially upon first contact with the system. They mention also that featuring facial expressions at an early stage improves subsequent interaction.
* Body Postures
Changes of the body modulate ongoing activities. Movements of the arms will support the way of talking to another person. The emotinal state of a person will affect the way he walk or move.
The degree of co-presence can be enhanced by providing the group with a sensation of sharing the same physical space, by maintaining body size, eye contact and gaze awareness. However, meetings of local or geographically dispersed groups require additional support for preparing documents, use of visual aids and generation of various artifacts during and after a meeting. A variety of tasks are performed by different members of the group during this process. Members might need to schedule co-presence sessions, prepare slides or electronic documents, generate reports, review diagrams or charts depending of the nature of the enterprise. Therefore, before, during and after a co-presence session support is needed also for co-working.
Therefore, tele-conferencing sessions in order to be successful and cost effective, should not only provide a high degree of co-presence but also support the different tasks that are performed by geographically dispersed groups. The following sections present ways of achieving co-presence and support co-working by the use of VR systems, such as TELEPORT and CyberStage. To exemplify the approach, different applications areas are considered, ranging from business seminars and teleteaching to collaborative modeling and manufacturing.
Fig. 20: Remote participant with virtual projection wall in TELEPORT's virtual extension
Once the preparation phase is completed, group members from different sites are ready to meet in the TELEPORT display rooms in each site. During the co-presence session information in the form of electronic documents, presentation material and shared workspaces is displayed on a virtual projection wall which is blended together with video imagery from remote participants into the virtual extensions of the display rooms, as shown in Figures 20 and 21.
Fig. 21: Session between two remote sites as seen from within a TELEPORT's display room
This is achieved by introducing a virtual projection wall that is mixed in the 3D model of the virtual rooms whenever that is necessary. This virtual projection wall is directly connected, for example, to the participant's notebook. Thus group members can choose what to show to the other participants, i.e. slides, pie charts or the way a software tool it's been used.
Fig. 22: Integrating teaching material in a co-presence session
Universities so far dispense courses only regionally to their students. However, instruments for global delivery of teaching content exist and are becoming cheaper by the day. In the fall term of 1996, a weekly high-bandwidth ATM connection between the University of Geneva and GMD headquarters in Bonn has been used to conduct regular tele-learning sessions. Our goals were to gain experience with high quality video conferencing technology and to broaden the perspective of course presentations while simplifying the logistics for remote participants. During these sessions, the teacher's image was extracted from a controlled background and blended with teaching material and virtual backgrounds, as shown in figure 22.
Another important aspect of this approach, is the possibility of using animation and visualization tools to enhance the understanding and facilitate the learning process. Animation and visualization tools have been extensively used for teaching within the academic community [Brown 88, Brown 92] and proved to be a successful teaching aid. Different possibilities exist for incorporating animation and scientific visualization in a co-presence session. For example, animations could be shown to students as pre-recorded video displayed on a virtual video wall, in a way similar to that of displaying transparencies. Another possibility, is to integrate the 2D or 3D visualization as part of the 3D model of the synthetic background of a TELEPORT session.
The AVANGO system presented earlier, will provide the possibility of connecting different VR systems, such as TELEPORT, Responsive Workbench and Cyberstage, thus reducing the cost of hardware and allowing sites with different VR systems to be connected for collaborative work. One or more TELEPORT rooms can be connected to CyberStage. The 3D model used in the CyberStage site could become part of the virtual extension of a TELEPORT room. In addition, the video image of the remote participants can be extracted and blended into the virtual extensions of the TELEPORT rooms. For the Cyberstage site, the participants of the remote TELEPORT rooms could also be included within the virtual 3D space. Interaction and manipulation of the 3D model could be restricted to the CyberStage site, or also allowed within the TELEPORT rooms, depending on the infrastructure of each site.
Auditory perception works similarly but with a completely different type of sensory evidence. First of all, sound doesn't represent objects but events. Sound is only generated if there is some sort of motion or action (i.e. an energy source) is involved. Information about acoustic events in our environment is transported to our ears by sound waves. The problem to be solved by the auditory scene analysis process is to decide which auditory cues (spectral and temporal aspects of the sound signal) belong to which sound source. Our ears are not primarily concerned with reflections (they may even become a source of confusion) but they are interested in the nature of the energy source. Vision uses the reflections of light and usually cares less about the light sources themselves.
These and many other differences account for the different, often complementary kinds of information about our environment the two sensory channels can supply us with. When looked at separately, the visual and the auditive channel provide us with different perspectives of the environment we analyze. But when experienced together, they form a whole which is more than the combination of its parts. Sensory cues from both channels are used to analyze an audio-visual scene. Evidence from one channel can be used to complement missing information in the other one. Sound informs us about events we cannot see because they are not in our field of view (e.g. happen behind us or behind other objects), or they are too small or move too fast to be seen (e.g. vibrating objects), or can't be seen at all (e.g. vibrating air volumes). Through sound we learn about the material of an object colliding with another one or about the texture of an object sliding over another one. Sound informs us about the nature of interaction between objects and about the forces involved. And, last but not least, sound informs us about the environment in which we perceive it and in which it was produced and emitted. This is how we distinguish inside from outside spaces, small and large, empty and fully furnished rooms.
So there are many good reasons for combining auditory cues with visual cues in virtual environments. We saw that this combination provides virtual environments with redundant information (e.g. spatial or temporal cues), which is important to reduce perceptual ambiguity. We showed how auditory cues complement visual cues in an integrated audio-visual scene analysis process. And we may add two other aspects here. Firstly, the presence of sound in an interface masks environmental sounds and therefore increases the degree of immersion in a virtual environment. And secondly, experience has shown that fine grain auditive feedback to user actions enhances the sense of presence in virtual spaces.
Since other types of realities, as we can find them for instance in artistic applications, may need other sets of constraints to be met, it is important that the simulation system is flexible enough to allow for modeling of such realities as well. As an example we may think of experimental interaction metaphors, which may need special simulation modes (e.g. cut-like discontinuous navigation). Furthermore, a system architecture open at this level would invite for experimentation, as it is required to discover innovative interaction or navigation metaphors. Generally speaking, an ideal audio-visual simulation system should allow any relationship between visual and auditory cues to be expressed explicitly. Naturally, this would include the special relationships necessary to simulate worlds similar to our everyday causal world, which still is the first requirement and most important validation criterion for a virtual reality system today.
Which model to choose depends on the sound quality needed, the control needed over the sound, and the computational resources available. As a rule of thumb it can be said that physical modeling usually needs more CPU cycles than signal oriented synthesis techniques, which do not simulate the total comportment of a vibrating object but only mimic the sound signal resulting from all these vibrations. Physical models are preferable when the sound radiation is an important aspect in an auditory scene, because they often use spatial representations, i.e. they distinguish how an objects vibrates on one or the opposite side. Naturally, physical models are better suited to simulate complex interactions between objects (e.g. brushing or scratching) but they are usually much harder to control than signal models. This is why mostly signal models are used when sampled sounds cannot be used in an auditory scene. The disadvantage of sound samples is that they can only be transformed (i.e. parametrized by event attributes) in a very narrow range. Individual perceptual attributes of sampled sound material are not accessible because of the lack of a real sound model (the sample is class and only instance at a time). Sound samples can only be transformed by being reproduced at different pitches, amplitudes, and durations (by playing only a fragment of a sample). Sound models using a combination of sound samples and signal modeling techniques can combine advantages of both (e.g. modeling sound by filtering and mixing sound samples).
How do these nodes compare to nodes defining visual scene elements? The radiation and environment nodes define how the sound source will be rendered and are thus comparable to visual property nodes (e.g. transformation and appearance node; actually, the radiation node is a transformation node). And the sound model node is comparable to the visual shape nodes.
The sound server is based on IRCAM's Max/FTS real-time sound processing system originally built for computer music applications [Lindemann 90, Dechelle 95]. FTS is an extensible signal processing kernel providing all necessary low-level modules to build sophisticated sound synthesis and processing applications. Max is a graphical programming environment [Puckette 95] used to interactively build FTS programs. Max allows to control and monitor the state of a signal processing program running in FTS. The spatialization algorithms used in the sound server are based on IRCAM's Spatialisateur toolkit [Jot 95] developed in Max/FTS. The software built on top of these components consists of parts realized in Max (synthesis control, resource management, message parsing) and FTS extensions written in C (efficient spatialization modules, sound sample manager, custom synthesis algorithms, network communication). The sound server is not a closed application but an open toolkit adapted to a large class of applications. The application designer chooses among many templates provided by the server to solve standard problems.
Since the base system used to implement the sound server (FTS) doesn't allow dynamic allocation of modules for efficiency reasons, static banks of modules are allocated at startup time and individual modules (therefore also called voices) are switched on and off on demand. This implies that the application designer has to foresee the maximum number of each module type ever needed at the same time in an application. This may be tedious, but the application designers are rewarded by a maximum use of the available processing power.
In essence, sound spheres is about localization of moving sound and light sources. The audience is immersed in a space of weightlessness enclosed by a large sphere. This space is populated by small rotating spheres slowly moving along circular paths. The audience activates these spheres by inflating them with a virtual pump. The more a sphere is inflated, the longer it will keep on emitting percussive sounds and light flashes in simple rhythmical patterns. But pointing the virtual pump at a sphere for too long a time will lead to its explosion accompanied by a violent detonation flash and noise. While freely floating in between the flashing and sounding spheres, the audience experiences an ever changing rhythmical tissue of spatialized sound and light. The perceptual plausibility and coherence of this experience is achieved by a careful adjustment of the dynamic behavior of the light flashes, the light model parameters, the sound material, and the room acoustics attributes. Sound spheres also serves as an example for the high degree of immersion achievable by perfect synchronization of image and sound rendering.
* Sound Model
The sound model used in Sound Spheres is a good example for the sound modeling capabilities of the sound server and shall therefore be described in some detail here. The rhythmical patterns audible in Sound Spheres are formed by streams of sounds of a certain timbre presented in regular repetition. The complexity of the resulting pattern is achieved by slightly different repetition rates. The resulting temporal phasing effect typical for minimal music can only be obtained through a careful choice of the sound material. The sound material has to meet three requirements. First, each sound should be easily localizable in space. Therefore we decided to use percussive sounds. Second, each sound should be easily identifiable by its timbre in order to form a clear perceptual stream when presented in repetition. The identification should work even if several streams were audible at a time. Third, since the rhythmical structure is based on repetition, each time a sound is presented, it should sound slighly different, every time displaying another facet of its class. This can be obtained by micro-variations in the spectrum which are typical for all percussive sounds.
It is clear that these requirements are hard to be met with sound sampling which cannot provide such richness of variation and strength of identification at the same time. Model-based direct sound synthesis was chosen instead. A bank of 10 resonating 2nd-order filter is used in a subtractive synthesis setup. The filter parameters (center frequency, bandwidth, amplitude for each filter describe one partial in the spectrum) are generated by a spectral model describing are large but perceptually clearly characterized class of wood-like sounds. The spectra are described by ranges of possible variation for each of the 30 parameters. Within these ranges random decisions are taken to produce an instance of the sound class. The filter bank is excited by short noise bursts. Due to the random nature of noise, the burst vary slightly from excitation to excitation and thus excite the filters differently every time. The timbre variations achieved this way carry a high degree of perceptual plausibility because the model mimics a typical case of excitation/resonance based sound generation we know very well from our everyday world (all percussive sounds are generated that way).
Fig. 23 shows one voice of the described sound model as a visual Max program. The voice receives messages from the Avango sound service. A parser directs the messages to the corresponding objects. The excitation object can receive a trigger message to generate one noise burst. Whenever the spectrum object receives a change message, it will generate a new spectrum. The new spectrum is passed to the right input of filter-bank object which receives at its left input the signal from the excitation object. The output of the filter-bank is sent to the left input of spatialzer object which can receive space messages from Avango on its right input. These space messages indicate where the sound source is located with respect to the observer and his or her position with respect to the loudspeakers. The output of the spatializer object is sent to the 8 loudspeakers.
* An open sound field is created by 8 loudspeakers in cube configuration.
* Sound projection is complemented by vibration emitters built into the floor to produce low-frequency vibrations.
* Model-based real-time sound synthesis is used to overcome sample-based approaches (complements sample playback).
* Life-audio input and sound file playback from disk are possible.
* Scalability of sound rendering (from a few high-quality up to 50 low-quality concurrent voices).
* Entirely software-based sound server (no special DSP or "off-the-shelf" MIDI sound processing hardware needed) using 32-bit floating-point digital signal processing for high audio quality.
In principle, all the attributes of a physical object, e.g. position, orientation, shape, colour, etc. may change. However, at the actual state of the art, physically based modelling deals mainly with motion and deformation of physical objects [Dai 97]. In this section, we will focus on the problem of motion control of solid bodies in VR. In this case, we may view our task as
* building movable or animatable objects representation
An animatable object representation is a visual data base which has an appropriate structure to allow on-line specification of motion variables. A data representation of a human skeleton, for example, needs to be sectioned into parts, and the spatial parameters, i.e. position and orientation of each part are represented by on-line modifiable variables. In walk-through type virtual reality, the world model is most often not animatable.
* adding guidance model - building guided moving objects
A guided moving object is an animatable object, associated with a guidance model. This guidance model is typically a one-to-one mapping from animator input variables to the motion parameters of the animatable data structures. We qualify it guided because the model itself contains no information about how to move, and such information is provided completely by the animator.
* adding autonomy - building semi-autonomous and autonomous moving bodies
Autonomy of a virtual physical body here means the capacity that enables it to behave like the real counterpart without the animator's intervention. Adding autonomy implies the embedment of object dynamics.
It is possible that some objects are only guided and some others are only autonomous, however the actual trend is towards multi-modal motion control, that is an autonomous object should have different control modes, ranging from low level to high level autonomy.
Indeed, the essential difficulty in interference handling is not the correct formulation of the collision detection models, but the improvement of the run time efficiency. The efforts to speed up the collision detection include the following:
The most general formulation of the collision would be based on the free-form object assumption. It is obvious that more specific formulations for collisions between special types of forms such as particles, plan, sphere, box, cylinder, polygons, etc., should be much more efficient.
* spatial approximation
Replacing complex geometry by simpler ones, e.g. sphere, box, etc. is an important method to speed up collision detection, of course at the cost of the precision. On the other hand, we can also represent the virtual space with rough "points", e.g. cubic regions - the Octree-method [DAI 97] falls into this category.
* temporal coherence exploitation
Every physical body has inertia, so that the continuity of motion is one of the most basic assumptions. Therefore the past results of the collision detection may be used to reduce the computation time for that of the present. In addition, a more precise dynamic model of objects can be also used to predict the next collision, which is important to avoid the detection error when objects are moving too fast.
* optimal scheduling
When a large number of physical bodies are involved, the problem of choosing the best order in which pairwise detections are processed becomes very important. This scheduling may be based on the possibility of collision and the computation complexity of each pair.
* multibody kinematics
Kinematics of a multibody is the relationship between the motion variables, i.e. position, velocities and accelerations of its body parts. Forward kinematics of a body chain is the mapping form the joint motion variables to those of its terminal part. Inverse kinematics is the inverse mapping of the forward kinematics.The major difficulty of inverse kinematics resides in redundancy, real-time computational speed, and joint limits. The solutions to the inverse kinematics fall mainly into two categories: algebraic (i.e. closed form) and iterative.
* multibody dynamics
The dynamics of a body (including multibody) is the relationship between its motion variables and torques or forces applied to it. Forward dynamics of a body chain is a mapping from the applied joint torques or forces to their motion variables. Inverse dynamics refers to the mapping from the motion specifications to the desired joint torques or forces. Implementation of dynamics is essentially the problem of solving a system of differential equations. Methods used here can be also grossly divided as closed form, iterative and recursive.
Although methods in robotics are very helpful, they are not fully suitable for the simulation of bodies of virtual animals and humans, as those have more complex structure than most of the existing robots. Therefore new methods have to be studied. The closed form solution is more efficient than the iterative ones, and has no problem of convergence, while the iterative ones have the advantage to be more general.
Normally, the physical simulation of multibodies needs only the forward kinematic and dynamic models, and the inverse kinematics and dynamics are used for goal oriented object like robots and living beings. However, they are employed for inanimate objects simulation in the approach called teleological modelling [Baldler 91].
* action primitives
Motion of virtual living beings involves coordinated motions of a very big number of degrees of freedoms, so that we need some higher level abstractions, e.g. actions, instead of treating them directly in terms of numeric formulations. Much effort has been made by many researchers on locomotion and manipulation [Zel 90b, Badler 93, Magnenat-Thalmann 96]
Visual and other exterioceptive perception of humans are essential, because they can hardly rely on their previous knowledge about their environment and apply open loop control to succeed in their physical activities.
* reflexive and intelligent behavioral controls
The control mechanism needed for connecting the perception to actions is one of the most interesting research topics. It has a very strong flavour of AI and automatic controls.
Traditional approaches in AI tends to represent an intelligent entity such as a human as a centralized system consisting of perception, reasoning and action modules, while actually the distributed intelligence approaches dominate either the domain of AI and the simulation of virtual creatures. These are known as emergent behaviors, autonomous agents, sensor actuator network etc. They are successfully applied in the examples such as the artificial fishes [Terz 94], Jack [Badler 93]. A generic programming method is also applied to modelling of structural behaviour, i.e. evolution [Sims 94].
On the one hand, such general framework may be useful in putting actually rather ad hoc and piecemeal methods into a more coherent theoretical structure. On the other hand, such formulations are especially adequate for the Object Oriented programming paradigm, and may further provide high level intuitive programming interfaces for physical simulation.
Additionally, we also try to combine the results from robotic visual servoing [Dai 90, 92, 93a, 93b] techniques and qualitative modelling and reasoning [Patrick 85] into our general framework.
Unlike conventional animation where the data update more frequently than the frame rate, which is a waste of computing power, dynamic behaviour simulation gets more accurate as its updating frequency increases. Therefore, in our system, physical behaviour simulation is executed in a process running concurrently with the visualization process, and its update frequency is adjusted based on a compromise between its precision and time consumption balanced with that of the visualization process.
Up to now, the following specific models and scenarios have been implemented.
* An Autonomous Moving Ball
A physically realistic ball in the virtual world is a simple yet basic example of physical bodies. The ball has the general structure of a general behavioral model, in which the sensor's task is to detect collision and a high level logic controls its lower level behavoirs. i.e. flight, rebound and sliding. In order that the ball behaves physically, its static environmental objects need to be all physical. In the caveland scenario, the visitor can pick-up this ball and throw it. It can then fly, bounce or slide in the environment accompanied by a coordinated sound effect produced by the sound server in AVACADO. It adds some physically realistic interaction between the visitor and the caveland world.
Fig. 24: An interactive autonomous ball in caveland
* A kinematics and dynamics based human body
This human body model can be guided through three modes: forward kinematics, inverse kinematics, and dynamically constrained inverse kinematics. Figure 2 shows a snapshot where the right arm is guided under dynamic constraint to follow the moving target, which is controlled by the user.
* A virtual ping-pong game
The virtual ping-pong game integrates all the important ingredients discussed above, so that it provides a good testbed of our unified approach. Some theoretical study and a simple implementation based on Open Inventor has been done previously [Dai 96]. Here the ping-pong scenario under the system AutoMove, consists of a guided player which is the participant's avatar, an autonomous virtual player, a virtual ping-pong ball, and the game environment. The virtual player perceives the ball motion, predicts its trajectory, then invokes some appropriate playing actions such as push, lob and side stepping etc. The trajectory predictor and decision maker of the player involve some qualitative modelling and reasoning about the ball, the environment and his own body. The playing actions are themselves sensor-controller-actuator loops at lower levels.
Fig. 25: A kinematics and dynamics based human body
Fig. 26: A virtual ping-pong game
* project coordination (5 people)
* software supervision (1 people)
* art supervision (1 people)
* software development (9 people)
* modeling (7 people)
* custom effects (5 people)
* music and sound rendering (5 people)
* system installation and support (2 people)
* special support (11 people)
* decide which extensions need to be programmed
* estimate efforts and time ("nice effect but takes years to develop")
* supervise integration process ("do the pieces work together?")
* estimate the complexity and resulting performance of the whole application ("cool but too slow?")
* give guidance for software developer ("help me guru!")
* evaluate modules and suggest improvements ("still too time consuming?")
* communicate with modelers due to special requirements for certain special effects
* estimate efforts and time ("nice idea but takes years to model")
* supervise integration process ("do the pieces fit together?")
* give guidance for modelers ("how do I create shadows?")
* evaluate the different models and suggest improvements ("doesn't look good in the CyberStage environment")
* develop new modules including sound rendering and special effect modules
* integrate the modules in the software system
* test, evaluate and improve the modules especially considering performance issues
* communicate directly with modelers while working on a specific effect
* create models while considering the special requirements (limited number of polygons, textures, ...)
* cooperate with software developers and adjust models due to the needs for special effects
* test and evaluate the models in the virtual environment due to the different light conditions etc.
* modeling of the virtual actors including geometric representation
considering behaviour and gestures
* synchronization of sound and model
* cooperation with software developers to create a interface to the Avango system
* edit, process, and synthesize sound material
* synchronize image and sound
* test and evaluate the spatial sound experience
* compose music and record samples
* administer the system while developing the project
* reconfigure the system according to the specific needs (sirius
* supervise the moving of all the hardware to the demo location
* reinstall and test everything at the event location
* prepare video material used for special effects
* just make things possible by paving the way
* support you with ideas or give software tips
* represent models scanned in for special effects
* provide samples of their speech
o SGI IRIS Performer 2.1
* MultiGen II
* In House Converter
* SGI Converter
* SGI Inventor Tools
* SoftImage Eddie
* SoftImage StudioPaint
* SGI Tools
* Digidesign ProTools III
* Emagic LOGIC Audio 2.6
* Emagic SoundDiver
* IRCAM AudioSculpt
* IRCAM Spatialisateur
* CCRMA snd
* vertex and norms animations make water swashing and glinting and let suspension bridges swinging
* texture and material animations create blazing flames
* moving animations make a lift working, a pendulum swinging, wheels turning
* interpolated paths a user may follow to explore the world
* live video input (video textures)
* reflection maps and reflection faces makes it look realistic
* quicktime movies imported as texture generate a lava stream
* localized sound sources enhance the visual effects
* level of detail effect switches
* exploding sound objects
* virtual actors accompanying a user
RAM, 64 MB Texture Memory, 2 Sirius video boards
* CyberStage projection system, wooden projection room 3m^3
* Polhemus tracking system (head, two input devices)
* 16 CrystalEyes shutter glasses
* video cameras used for live video input
* 4 channel surround sound system
* acoustic floor
[Bad90] "Animation from instructions", N. I. Badler, B. Webber, J. Kalita and J. Esakov. In Making Them Move: Mechanics, Control, and Animation of Articulated Figures, Morgan-Kaufmann, 1990, pp. 51-93.
[Badler 91] Baldler,N., Barsky bB. and Zeltzer, D. (eds): "Making Them Move". Morgan Kaufmann Publishers Inc. 1991.
[Bad 93] N. I. Badler, C. B. Phillips, and B. L. Webber. (1993) "`Simulating Humans: Computer Graphics, Animation, and Control." Oxford Univ. Press, 1993.
[Bar 91] Baraff, D. (1991), "Rigid Body Concepts", in Course notes, Siggraph 91
[Baraff 95] David Baraff, "Interactive Simulation of Solid Rigid Bodies", IEEE Computer Graphics and Applications, May 1995.
[Bech 96] P. Becheiraz, D. Thalmann, The Use of Nonverbal Communication Elements and Dynamic Interpersonal Relationship for Virtual Actors, Proc. Computer Animation 96, IEEE Computer Society Press, June 1996, pp.58-67.
[Bent 97] Bentley, R., Horstmann, T. and Trevor, J., The World Wide Web as enabling technology for CSCW: The case of BSCW, Computer Supported Cooperative Work: The Journal of Collaborative Computing. Special issue on CSCW and the Web, Vol. 6, Nos 2-3, Kluwer Academic Publishers, 1997
[Blum 95a] Blumberg, Bruce M.; Galyean, Tinsley A (1995). " Multi-Level of Autonomous Animated Characters for Real-Time Virtual Environments ", Siggraph '95 Proceedings
[Blum 96a] Blumberg, B., P. Todd and P. Maes(1996). No Bad Dogs: Ethological Lessons for Learning. In: From Animals To Animats, Proceedings of the Fourth International Conference on the Simulation of Adaptive Behavior, September 1996, MIT Press. Cambridge Ma.
[Blum 96b] Blumberg, Bruce (1996). Old Tricks, New Dogs: Ethology and Interactive Characters. PhD Dissertation. MIT Media Lab.
[Bly 93] Bly S.A., Harison S.R., Irwin S., MediaSpaces: Bringing People Together in Video, Audio, and Computing Environment, CACM, Vol 36, No. 1, pp. 29-47, January 1993
[Breit 96] Breiteneder C., Gibbs S. , Arapis C., TELEPORT- An Augmented Reality Teleconferencing Environment, Proc. 3rd Eurographics Workshop on Virtual Environments Coexistence & Collaboration, Monte Carlo, Monaco, February 1996
[Brown 88] Brown M.H., Perspectives on algorithm Animation, CHI'88: Human Factors in Computing Systems, Washington, D.C., 33-38, 1988
[Brown 92] Brown M.H., Zeus: A System for Algorithm Animation and Multi-view Editing, DEC Systems Research Center, Research Report, 75, February 28, 1992
[Bryson 92] Steve Bryson and Creon Levit, "The Virtual Windtunnel", IEEE Computer Graphics and Applications, July 1992
[Buxton 92] Buxton W., Telepresence: integrating shared task and person spaces. Proc. Graphics Interface `92, 123-129
[Cadoz 91] Cadoz, C., "Timbre et causalité", in: J.B. Barrière, ed. Le timbre, métaphore pour la composition, Christian Bourgois Éditeur / IRCAM, Paris, 1991.
[Cook 97] Cook, D., Cruz-Neira, C., Kohlmeyer, B., Lechner, U., Lewin, N., Nelson, L., Olsen, A., Pierson, S., Symanzik, J., (1997): "EXPLORING ASSOCIATIONS AMONG MID-ATLANTIC STREAM INDICATORS USING DYNAMIC MULTIVARIATE GRAPHICS AND GEOGRAPHIC MAPPING IN A HIGHLY IMMERSIVE VIRTUAL REALITY ENVIRONMENT;, EMAP meeting, Albany, NY, April 1997
[Dai 90] Ping Dai, "Etude et réalisation d'une commande de robot industriel polyarticulé par vision en mode dynamique", Doctoral thesis, 1990.5, UPL, Strasbourg, France.
[Dai 92] Ping Dai, "On Dynamic Visual Control of Industrial Robots", IEEE International Symposium on Industrial Electronics, 1992.5, Xian, China.
[Dai 93a] Fan Dai and Ping Dai, "Graphics Simulation of Dynamic Vision based Robotic Workcell", IFIP. International workshop on Graphics and Robotics, Dagstuhl, Germany, April 1993.
[Dai 93b] Ping Dai and Fan Dai, "Analysis and Modeling of Dynamic Vision Based Robotic System", in Proceedings IEEE TENCON'93-IEEE regionten Conference on Computer, Communication, Control and Power Engineering, International Academic Publishers, 1993.10, Beijing, China.
[Dai 96] Ping Dai, "Virtual Ping-Pong Game - Autonomous Objects in Virtual Reality", internal report FhG-IGD, July 1996.
[Dai 97] Fan Dai, Lebendige virtualle Welten: "Physikalisch-basierte Modelle in Computeranimation und virtualler Realität", Springer Verlag, Berlin Heidelberg 1997.
[Dechelle 95] Dechelle, F., DeCecco, M., "The IRCAM Real-Time Platform and Applications", Proceedings of the 1995 International Computer Music Conference, International Computer Music Association, San Francisco, 1995.
[Eckel 93] Eckel, G., "La maîtrise de la synthèse sonore", Les Cahiers de l'IRCAM, recherche et musique, No. 2, Paris, January 1993.
[Gaf 94] Gaffron, S. (1994)," SkillBuilder: A Motor Program Design Tool for Virtual Actors", M.S. Thesis, Februray 1994, Massachusetts Institute of Technology, Cambridge,MA.
[Gibs 66] Gibson, J.J. (1966), "The senses considered as perceptual systems, " Boston: Houghton Mifflin
[Gib 79] Gibson, J.J. (1979), "The Ecological Approach To Visual Perception", Houghton Mifflin Company Boston
[Haase 97] H. Haase, and F. Dai and J. Strassner and M. Göbel, "Immersive Investigation of Scientific Data", Scinentific Visualisation, IEEE Press, 1997
[John 95] Johnson, Wavesworld: A testbed for 3d, semi-autonomous animated characters. Mar 1995. MIT, Media Lab Ph.D. dissertation.
[Jot 95] Jot, J.-M., Warusfel, O., "A Real-Time Spatial Sound Processor for Music and Virtual Reality Applications", in: Proceedings of the1995 International Computer Music Conference. San Francisco: International Computer Music Association, 1995.
[Kog 94] Y. Koga, K. Kondo, J. Kuffner, and J.C. Latombe (1994). "Planning Motions with Intentions.", Proceedings of SIGGRAPH'94 (Orlando, Florida, July 24-29, In Computer Graphics Proceedings (July, 1994), pp.395-408.
[Krüger 94] Wolfgang Krüger and Bernd Fröhlich, "The Responsive Workbench", IEEE Computer Graphics and Applications, May 1994
[Lass 87] J. Lasseter. Principles of traditional animation applied to 3d computer animation. Computer Graphics, 21(4), July 1987.
[Lin 93] Y. Lin and Y. Ma, "System - A Unified Concept," Cybernetics and Systems, 24:375-406, 1993.
[Lindemann 90] Lindemann, E., Starkier, F. & Dechelle, F., "The IRCAM Musical Workstation: Hardware Overview and Signal Processing Features." In: S. Arnold and G. Hair, eds. Proceedings of the1990 International Computer Music Conference. San Francisco: International Computer Music Association, 1990.
[Lor 73] Lorenz,K., (1973), "Foundations of Ethology", Springer-Verlag, New York.
[Magnenat-Thalmann 96] N. Magnenat-Thaalmann and D. Thalmann, "The Simulation of Virtual Humans", Proc. GraphiCon'96, Moscow 1996.[Min 87] Minsky, M. The Society of Mind, Simon and Schuster, N.Y. 1987.
[Mesarovic 89] Mesarovic, M.D., and Y. Takahara., "Abstract Systems Theory," Springer Verlag, Berlin 1989.[Min 94] Minsky, Marvin L., Will Robots Inherit the Earth?, Scientific American, Oct, 1994
[Nag 96] Katashi Nagao and Jun Rekimoto, Agent Augmented Reality: A Software Agent Meets the Real World, To appear in Proceedings of the Second International Conference on Multi-Agent Systems, 1996
[Nag 94] Katashi Nagao and Akikazu Takeuchi, Speech Dialogue with Facial Displays: Multimodal Human-Computer Conversation, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94)", 1994
[Norm 88] Norman, D.A. (1988), "The Psychology of Everyday Things, " New York: Basic Books.
[Patrick 85] Patrick, J. Hayes, "The second Naive Physics Manifesto," in: "Formal theories of the common sense world", Ablex: Norwood, 1985.
[Puckette 91] Puckette, M., "Combining event and signal processing in the Max graphical programming environment", Computer Music Journal, vol. 15, no. 3, MIT Press, Cambridge, MA, 1991.
[Ras 83] Rasmussen, J. Skills, Rules and Knowledge; Signals, Signs and Symbols and other Distinctions in Human Performance Models, IEEE Trans. on Systems, Man, and Cybernetics, vol. SMC-13, no. 3 (May/June 1983).
[Reed 96] Reed, Edward S., "Values and knowledge", Lawrence Erlbaum Associates , 1996.
[Rey 87] Reynolds, Craig W. (1987). "Flocks,Herds and Schools: A Distributed Behavioral Model, " Siggraph 87
[Scha 75] Schank, R.C. (1975), "Conceptual Information Processing. 1975, , " New York: American Elsevier Publishing Company
[Seeg 90] Seeger, L. Creating Unforgettable Characters, Henry Holt and Company, New York, 1990.
[Sims 87] Sims, K. (1987), "Locomotion of Jointed Figures over Complex Terrain, SMVS Thesis, , " MIT Media Lab, 1987.
[Sims 94] Sims, K. (1994), "Evolving Virtual Creatures", Computer Graphics (Siggraph '94) Annual Conference Proceedings, July 1994, pp. 15-22.
[Str 96] Strassner, J., (1996), "Autonomous Actors in Virtual Environments - physical object properties to constrain behaviour", Diploma Thesis, Computer Science, Technical University of Darmstadt, Germany.
[Sym 97] Symanzik, J., Cook, D., Kohlmeyer, B. D., Lechner, U., Cruz-Neira, C. (1997): "Dynamic Statistical Graphics in the C2 Virtual Environment", Second World Conference of the International Association for Statistical Computing, Pasadena, California, USA, February 19-22, 1997
[Terz 94] Terzopoulos, D. et al. (1994). "Artificial Fishes with Autonomous Locomotion, Perception, Behavior and Learning, in a Physical World, " Proceedings of the Artificial Life IV Workshop, Pattie Maes and Rod Brooks (ed.), MIT Press 1994.
[Tin 50] Tinbergen, N. (1950), "The Study of Instinct", Clarendon Press, Oxford
[Toat 91] Toates F. & Jensen, P. (1991). "Ethological and Psychological Models of Motivation: Towards a Synthesis", In: From Animals to Animats, Proceedings of the First International Conference on the Simulation of Adaptive Behavior, Edited by Meyer,J. and Wilson, S.W., MIT Press 1991
[Tos 93] Tosa,N. (1993), "Neurobaby, "SIGGRAPH-93 Visual Proceedings, Tomorrow's Realities, ACM SIGGRAPH 1993, pp. 212-213, 1993
[Yang 89] Yang, Z., "New Model of General Systems Theory," Cybernetic. Syst. 20: 67-76, 1989.
[Zel 90a] McKenna, M., S. Pieper and D. Zeltzer (1990). "Control of a Virtual Actor: The Roach", Proc. 1990 Symposium on Interactive 3D Graphics, Snowbird UT, March 25-28, 1990, pp. 165-174.
[Zel 90b] McKenna, M. and D. Zeltzer (1990). "Dynamic Simulation of Autonomous Legged Locomotion", Proc. ACM SIGGRAPH 90, Dallas TX, pp. 29-38.
[Zel 91a] Zeltzer, D. (1991), "Task Level Graphical Simulation: Abstraction, Representation and Control", in Making Them Move: Mechanics, Control and Animation of Articulated Figures, N. Badler, B. Barsky and D. Zeltzer, eds., San Mateo CA, Morgan Kaufmann, pp. 3-33.
[Zel 91b] Zeltzer, D. and M. Johnson (1991). "Motor Planning: Specifying and Controlling the Behavior of Autonomous Animated Agents", Journal of Visualization and Computer Animation, 2(2), April-June 1991, pp. 74-80.
[Zel 94] Zeltzer, D. and M. Johnson (1994). "Virtual Actors and Virtual Environments: Defining, Modeling and Reasoning about Motor Skills", in Interacting with Virtual Environments, L. MacDonald and J. Vince, ed., Chichester, England, John Wiley & Sons, pp. 229-255.
[Zel 96] Zeltzer, D. and S. Gaffron (1995). "Task Level Interaction with Virtual Environments and Virtual Actors", International Journal of Human-Computer Interaction, 8(1), pp. 73-94.