Blast is a new NVIDIA GameWorks destruction library developed to replace the APEX Destruction module. It is redesigned from the ground up, focusing on performance, scalability, and flexibility.
Blast is designed to leave physics and graphics - two things that most games already do well - to the application. It processes the elements of destruction in a streamlined representation, communicating to the user what is needed to update physics and graphics within their application. This approach allows us to focus on the performance of our core algorithms, and provide a library with robust and transparent functional behavior.
Blast consists of three layers: the low-level (NvBlast), a high-level "toolkit" wrapper (NvBlastTk), and extensions (prefixed with NvBlastExt). This layered API is designed to allow short ramp-up time for first usage (through the Ext and Tk APIs) while also allowing for customization and optimization by experienced users through the low-level API.
The low-level is a bare-bones API intended for use by experienced developers who want to have the thinnest layer between their engine and the computations performed by Blast. It has a C-style API consisting of stateless functions, with no global framework or context. Functions do not spawn tasks, allocate, or deallocate memory. They simply process inputs with user-supplied buffers.
Support Structures and Chunk Hierarchies
Support structures define how the fractured pieces can be “glued” together into actors. Blast support structures may include support chunks from different fracturing depths, giving the author greater flexibility than was available with APEX Destruction.
In addition, multiple chunk hierarchies may exist in a single asset.
Damage behavior is completely defined by user-supplied "shader" functions and associated (shader-specific) damage parameters.
between support chunks and weaken them:
boundary of an imaginary “damage sphere,” creating a science-fiction cutter tool.
NvBlastTk (high-level toolkit)
The high-level toolkit wraps the low-level NvBlast and provides powerful new features. It has a C++ API with a global framework that manages objects and memory via a user-supplied allocation callback.
Groups and Parallel Processing
NvBlastTk provides an actor group object which holds arbitrary sets of destructible actors. The user may move actors in and out of groups as desired. The purpose of a group is to be a processing unit. While processing, the group spawns tasks to calculate the effect of all accumulated damage taken to its actors, and communicates the results to the user through an event callback. These tasks may be run in parallel and groups may process in parallel, providing opportunities for performance increase with multiple processor cores.
NvBlastTk introduces a joint representation. Joints may be defined:
- Between support chunks in a single actor.
- Between support chunks of different actors.
- Between support chunks of an actor and the static world.
When actors connected by joints are fractured, the joints will become attached to different actors (or no longer join any actors in some cases). Such changes are communicated to the user through the event system to allow the physical representation to be updated. Again, Blast remains agnostic to the physics representation used by the application. It is up to the user to decide what kind of physical joint is created, and when or if to break it.
Joints allow an opportunity to create a variety of different physical behaviors on destructible objects, from door hinges to nails to swinging chandeliers to flexible objects. In our sample we create “internal” joints between all the support chunks in a destructible sheet. When the sheet is fractured into bits, the joints become active and join the bits together into a cloth-like object, a striking effect:
Blast extensions are utility libraries for both NvBlast and NvBlastTk. The source code for Blast extensions is intended to be a reference implementation of useful features. They are meant to be good enough to be used in production code as-is, but some users will want to modify the extensions for their own needs.
Current blast extensions are:
- ExtPhysX - a physics manager using PhysX which keeps PxActors and PxJoints updated in a user-supplied PxScene. It handles impact damage (through the contact callback), includes a stress solver, and provides a listener that enables multiple clients to keep their state synchronized.
- ExtAuthoring - a set of geometric tools which can split a mesh hierarchically and create a Blast asset, along with collision geometry and chunk graphics meshes in a separate files.
- ExtConverterLL - a data format converter for low-level assets and actor families. This simple converter uses user-defined conversion functions.
- ExtImport - provides functions to import an APEX Destructible Asset to create a Blast asset.
- ExtSerialization and ExtSerializationLL - serialization extensions for Tk and the low-level, which uses Cap'n Proto to provide robust serialization across different platforms.
- ExtShaders - sample damage shaders to pass to both the low-level and Tk actor damage functions.
The stress solver in ExtPhysX is a powerful feature that (again) has notably higher performance (and more features) than its counterpart in APEX Destruction. With it, one is able to use the internal stresses in a structure, along with external stresses from impacts and user-supplied forces, to determine where an actor should break due to weakness.
Authoring and Backwards Compatibility
The ExtAuthoring extension provides powerful CSG functions for splitting meshes, and tools to determine the connectivity between arbitrary meshes. This allows the user to import fractured meshes from any source and create Blast assets with appropriate bonds between support chunks.
Finding the bond interface between chunks.
In addition, the ExtImport extension will create a Blast asset from an APEX Destructible asset. The APEX asset only contains yes/no connectivity data between chunks, but ExtImport will determine the area and normal of the bond surfaces, quantities stored by Blast which are accessible to damage shader functions.
With the importer one has a chance to reuse APEX assets in Blast, as well as continue to author destructibles in PhysXLab if desired.
Performance Comparisons with APEX Destruction
In order to create a fair comparison between Blast and APEX Destruction, we set up identical test scenes and used comparable (radial) damage, applied with identical timing and placement with both destruction libraries.
Since APEX Destruction manages physics actor creation and graphics updates within its simulation step, we performed detailed internal timings of these operations and subtracted them from the simulation timing in order to arrive at a more accurate timing of APEX itself. Physics and graphics need to be updated in most realistic usages, so we also compared those timings for Blast and APEX. We did this for two reasons. First, as a confirmation that the scene complexity was similar in both test cases. Second, to ensure that we were accounting for all costs associated with destruction for a fair comparison.
We also wanted to test different scales of destruction, since a design goal of Blast was better handle large-scale destruction. To this end we used two different assets, one with a low chunk count and the other with a high chunk count. The setup is described in the image below.
Our test setup used ten walls in two rows of five each. We tested both a low-detail wall asset which broke into large chunks (left), and a high-detail asset which broke into a greater number of smaller chunks (right).
The walls in the tests are not joined by extended support (a feature of APEX). Since this feature is (by design) absent from Blast, we turned off extended support to get a valid comparison.
The low-detail asset had support on depth 2 of the chunk hierarchy, while the high-detail asset had support on depth 3. There are 81 depth-2 chunks, and approximately 1,000 depth-3 chunks. Therefore the high-detail support graph was an order of magnitude more complex than the low-detail graph. As you can see from the images above, the extra support kept many of the depth-3 chunks from becoming dynamic and entering the simulation (although there are still significantly more rigid bodies in the high-detail case). But our interest lay mainly with the complexity of the graph operations, as these lay at the heart of the damage calculations.
Our tests used only the low-level Blast SDK. These tests did not include any simultaneous actor damage, and therefore would not be candidates for the multithreaded damage capabilities of the high-level (Tk) Blast library.
Measurements and Results
Each test run applied radial damage to the center of each of the ten walls in succession, damaging one wall every hundred frames. Each frame we measured:
- Time in the destruction library
- Time in PhysX
- Time updating graphics
- The number of awake shapes
- The number of interacting shape pairs
The results follow:
Using the low chunk-count asset, the results for Blast are shown below:
The corresponding results for APEX Destruction follow:
Using the high chunk-count asset, the results for Blast are shown below:
The corresponding results for APEX Destruction follow:
For the small-scale setup, after the tenth damage application on frame 1,000, both Blast and APEX tests are generating roughly 700 active shapes (corresponding to a comparable number of actors), and roughly 2,000 interacting pairs. The PhysX simulation time is roughly 1.5ms in each.
The rendering setup time is about 250-350µs for Blast after frame 1,000, and about 400µs for APEX Destruction. We can see from the APEX timing that most of its time is spent doing this. Blast, however, is not called between damage frames, giving zero overhead.
On damage frames, however, both Blast and APEX do significant work. Blast consumes about 200µs on those frames, while APEX adds 1.5ms to its simulation step to perform the damage calculations.
For the large-scale setup, after the tenth damage application both Blast and APEX tests are generating over 1,000 active shapes and more than 7,000 interacting pairs. The PhysX simulation time is in the 5-6ms range.
The rendering setup time is levels out at about 1.0ms for Blast, and about 1.2ms for APEX. We can see that most of APEX’s time is spent in rendering, however there is some significant per-frame overhead on the order of a millisecond in APEX.
On damage frames, Blast is consuming about 600µs while APEX adds anywhere from 6 to 9ms to its simulation step.
We conclude that even for small-scale destruction Blast is greatly outperforming APEX, and the performance benefit grows significantly with the complexity of the destructible actors.
The overhead of physics simulation and rendering setup time is comparable for our Blast and APEX tests, which was to be expected. However in Blast those operations are completely within the application, giving the opportunity for user optimization and customization for their engine.
These tests give us confidence that our efforts to increase performance, scalability, and flexibility have been successful.