This example shows how functions implementing BSDFs can be generated by the "PTX" backend. To illustrate the use of these functions in CUDA, it implements a small physically based renderer that computes direct lighting from an HDR environment map using BSDF evaluation and importance sampling.
New Topics
- Generated BSDF functions
- Changing compiled material arguments at runtime
Detailed Description
- BSDF functions
The generated functions for BSDFs implement
- An initialization function to prepare shared data per hit.
- Evaluation of the BSDF for a given pair of incoming and outgoing directions.
- Importance sampling of an incoming direction given an outgoing direction.
- Probability density function (PDF) computation of generating an incoming for a given outgoing direction.
Each function takes arguments for state, resources, and material arguments that are analogous to the material expression counterpart of the earlier examples and the latter three functions further take a specific input and output parameter data
. The signatures of the functions look like this:
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
const Resource_data *res_data,
const void *exception_state,
const char *arg_block_data);
The functions can be generated by mi::neuraylib::IMdl_backend::translate_material_df() or mi::neuraylib::ILink_unit::add_material_df(). Both functions have a parameter include_geometry_normal
that can be specified to make the initialization function replace state->normal
by the result of the expression connected to geometry.normal
of the material. The final function names are specified by a base name that will suffixed by _init
, _sample
, _evaluate
, and _pdf
.
BSDF evaluation and PDF computation take a pair of directions and IORs (index of refraction) as input and produce a PDF and, in the case of Bsdf_evaluate_function
, the value of the BSDF. The corresponding structs are
struct Bsdf_evaluate_data {
float3 ior1;
float3 ior2;
float3 k1;
float3 k2;
float3 bsdf;
float pdf;
};
struct Bsdf_pdf_data {
float3 ior1;
float3 ior2;
float3 k1;
float3 k2;
float pdf;
};
The handling of the (color) IOR is generally up to the renderer, in particular this includes the decision if the evaluation should happen from within an object with refracting material or from the outer side. It is possible though, to just pass in one IOR (e.g. ior1
) and make the functions fill the other with the IOR of the material by marking the x
component of it with MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
(e.g. ior2.x=MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
).
For importance sampling, one direction and a set of pseudo-random numbers needs to be provided. The output values include the importance sampled direction, a PDF, a weight (the value of the BSDF divided by the PDF), and the type of the BSDF that was eventually sampled. Note that this may include an absorption event, in which case no direction has been generated.
BSDF_EVENT_ABSORB = 0,
BSDF_EVENT_DIFFUSE = 1,
BSDF_EVENT_GLOSSY = 1 << 1,
BSDF_EVENT_SPECULAR = 1 << 2,
BSDF_EVENT_REFLECTION = 1 << 3,
BSDF_EVENT_TRANSMISSION = 1 << 4,
BSDF_EVENT_DIFFUSE_REFLECTION = BSDF_EVENT_DIFFUSE | BSDF_EVENT_REFLECTION,
BSDF_EVENT_DIFFUSE_TRANSMISSION = BSDF_EVENT_DIFFUSE | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_GLOSSY_REFLECTION = BSDF_EVENT_GLOSSY | BSDF_EVENT_REFLECTION,
BSDF_EVENT_GLOSSY_TRANSMISSION = BSDF_EVENT_GLOSSY | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_SPECULAR_REFLECTION = BSDF_EVENT_SPECULAR | BSDF_EVENT_REFLECTION,
BSDF_EVENT_SPECULAR_TRANSMISSION = BSDF_EVENT_SPECULAR | BSDF_EVENT_TRANSMISSION,
BSDF_EVENT_FORCE_32_BIT = 0xffffffffU
};
struct Bsdf_sample_data {
float3 ior1;
float3 ior2;
float3 k1;
float3 xi;
float3 k2;
float pdf;
float3 bsdf_over_pdf;
};
It often is the case that for a given shading point both evaluation and importance sampling need to be performed (possibly multiple times). To avoid re-computation of material expressions in each BSDF function call it is essential that the results are cached between multiple calls. This is accomplished by the initialization function Bsdf_init_function
which stores computed results in an array passed by the mi::neuraylib::Shading_state_material::text_results field. The size of that array needs to be communicated to the backend using the "num_texture_results"
option via mi::neuraylib::IMdl_backend::set_option(). If the storage is insufficient (e.g. for a material with a high number of material expressions), non-cached expressions are automatically recomputed when they are needed. The initialization is generally not optional, i.e. even if the text_results
array size is set to zero it may still perform some initialization, in particular it will update state->normal
if requested.
- EDF functions
Analogous to the generation of BSDF functions, emission distribution functions (EDFs) can be generated using the corresponding signatures and data structures. For more information see the example code and run the example application using materials that contain EDFs:
df_cuda ::nvidia::sdk_examples::tutorials::example_edf
df_cuda ::nvidia::sdk_examples::tutorials::example_measured_edf
Global distributions are not supported yet and result in no emission. Hence, the generated EDFs can currently be evaluated only in tangent space.
- Changing arguments of class-compiled materials at runtime
As explained in Instance-compilation and class-compilation, the resulting mi::neuraylib::ITarget_code object contains mi::neuraylib::ITarget_value_layout and mi::neuraylib::ITarget_argument_block objects for each material, when class-compiled materials are used for generating target code. Together with the corresponding mi::neuraylib::ICompiled_material, you can already get some information about the arguments:
The example uses this information to build a material editor GUI with the "Dear ImGui" framework (https://github.com/ocornut/imgui). The GUI controls are linked to the data of the target argument block using the offsets of the arguments. When ImGui reports any changes by the user, the target argument block is updated on the device for the current material.
When presenting material arguments to the user, additional information from parameter annotations may improve the user experience. To find the annotations for an argument of a compiled material, you have to look up the annotation block for a parameter of the corresponding mi::neuraylib::IMaterial_definition with the same name as the argument.
- Note
- You will only find a parameter in a material definition with an exactly matching name for a compiled material argument, when non-struct constants were used as arguments during class-compilation. For other arguments you will get paths like "x.b" as parameter names.
In this example, the anno::hard_range
annotation is used to determine the minimum and maximum values for value sliders, the anno::display_name
annotation is used as a more user-friendly name for the arguments, and the anno::in_group
annotation is used to group the arguments into categories.
Example Source
To compile the source code, you require CUDA, GLFW, and GLEW. For detailed instructions, please refer to the Getting Started section.
Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cpp
#include <iostream>
#include <string>
#include <vector>
#include <list>
#include <map>
#include <memory>
#define _USE_MATH_DEFINES
#include "example_df_cuda.h"
#define OPENGL_INTEROP
#include "example_cuda_shared.h"
#include "imgui.h"
#include "imgui_impl_glfw_gl3.h"
#define terminate() \
do { \
glfwTerminate(); \
keep_console_open(); \
exit(EXIT_FAILURE); \
} while (0)
#define WINDOW_TITLE "MDL SDK DF Example"
inline float length(
const float3 &d)
{
return sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
}
inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f /
length(d);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
static GLFWwindow *init_opengl()
{
check_success(glfwInit());
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
GLFWwindow *window = glfwCreateWindow(
1024, 768, WINDOW_TITLE, nullptr, nullptr);
if (!window) {
}
glfwMakeContextCurrent(window);
GLenum res = glewInit();
if (res != GLEW_OK) {
}
glfwSwapInterval(0);
check_success(glGetError() == GL_NO_ERROR);
return window;
}
static void dump_info(GLuint shader, const char* text)
{
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &length);
if (length > 0) {
GLchar *
log =
new GLchar[length + 1];
glGetShaderInfoLog(shader, length + 1, nullptr, log);
} else {
}
}
static void add_shader(GLenum shader_type,
const std::string& source_code, GLuint program)
{
const GLchar* src_buffers[1] = { source_code.
c_str() };
GLuint shader = glCreateShader(shader_type);
check_success(shader);
glShaderSource(shader, 1, src_buffers, nullptr);
glCompileShader(shader);
GLint success;
glGetShaderiv(shader, GL_COMPILE_STATUS, &success);
if (!success) {
dump_info(shader,"Error compiling the fragment shader: ");
}
glAttachShader(program, shader);
check_success(glGetError() == GL_NO_ERROR);
}
static GLuint create_shader_program()
{
GLint success;
GLuint program = glCreateProgram();
const char *vert =
"#version 330\n"
"in vec3 Position;\n"
"out vec2 TexCoord;\n"
"void main() {\n"
" gl_Position = vec4(Position, 1.0);\n"
" TexCoord = 0.5 * Position.xy + vec2(0.5);\n"
"}\n";
add_shader(GL_VERTEX_SHADER, vert, program);
const char *frag =
"#version 330\n"
"in vec2 TexCoord;\n"
"out vec4 FragColor;\n"
"uniform sampler2D TexSampler;\n"
"void main() {\n"
" FragColor = texture(TexSampler, TexCoord);\n"
"}\n";
add_shader(GL_FRAGMENT_SHADER, frag, program);
glLinkProgram(program);
glGetProgramiv(program, GL_LINK_STATUS, &success);
if (!success) {
dump_info(program, "Error linking the shader program: ");
}
#if !defined(__APPLE__)
glValidateProgram(program);
glGetProgramiv(program, GL_VALIDATE_STATUS, &success);
if (!success) {
dump_info(program, "Error validating the shader program: ");
}
#endif
glUseProgram(program);
check_success(glGetError() == GL_NO_ERROR);
return program;
}
static GLuint create_quad(GLuint program, GLuint* vertex_buffer)
{
static const float3 vertices[6] = {
{ -1.f, -1.f, 0.0f },
{ 1.f, -1.f, 0.0f },
{ -1.f, 1.f, 0.0f },
{ 1.f, -1.f, 0.0f },
{ 1.f, 1.f, 0.0f },
{ -1.f, 1.f, 0.0f }
};
glGenBuffers(1, vertex_buffer);
glBindBuffer(GL_ARRAY_BUFFER, *vertex_buffer);
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);
GLuint vertex_array;
glGenVertexArrays(1, &vertex_array);
glBindVertexArray(vertex_array);
const GLint pos_index = glGetAttribLocation(program, "Position");
glEnableVertexAttribArray(pos_index);
glVertexAttribPointer(
pos_index, 3, GL_FLOAT, GL_FALSE, sizeof(float3), 0);
check_success(glGetError() == GL_NO_ERROR);
return vertex_array;
}
struct Window_context
{
bool mouse_event, key_event;
bool save_image;
int zoom;
int mouse_button;
int mouse_button_action;
int mouse_wheel_delta;
bool moving;
double move_start_x, move_start_y;
double move_dx, move_dy;
int material_index_delta;
bool save_result;
bool exposure_event;
float exposure;
};
static void handle_scroll(GLFWwindow *window, double xoffset, double yoffset)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (yoffset > 0.0) {
ctx->mouse_wheel_delta = 1; ctx->mouse_event = true;
} else if (yoffset < 0.0) {
ctx->mouse_wheel_delta = -1; ctx->mouse_event = true;
}
ImGui_ImplGlfwGL3_ScrollCallback(window, xoffset, yoffset);
}
static void handle_key(GLFWwindow *window, int key, int scancode, int action, int mods)
{
if (action == GLFW_PRESS) {
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
switch (key) {
case GLFW_KEY_ESCAPE:
glfwSetWindowShouldClose(window, GLFW_TRUE);
break;
case GLFW_KEY_DOWN:
case GLFW_KEY_RIGHT:
case GLFW_KEY_PAGE_DOWN:
ctx->material_index_delta = 1;
ctx->key_event = true;
break;
case GLFW_KEY_UP:
case GLFW_KEY_LEFT:
case GLFW_KEY_PAGE_UP:
ctx->material_index_delta = -1;
ctx->key_event = true;
break;
case GLFW_KEY_ENTER:
ctx->save_result = true;
break;
case GLFW_KEY_KP_SUBTRACT:
ctx->exposure--;
ctx->exposure_event = true;
break;
case GLFW_KEY_KP_ADD:
ctx->exposure++;
ctx->exposure_event = true;
break;
default:
break;
}
}
ImGui_ImplGlfwGL3_KeyCallback(window, key, scancode, action, mods);
}
static void handle_mouse_button(GLFWwindow *window, int button, int action, int mods)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
ctx->mouse_button = button + 1;
ctx->mouse_button_action = action;
ImGui_ImplGlfwGL3_MouseButtonCallback(window, button, action, mods);
}
static void handle_mouse_pos(GLFWwindow *window, double xpos, double ypos)
{
Window_context *ctx = static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->moving)
{
ctx->move_dx += xpos - ctx->move_start_x;
ctx->move_dy += ypos - ctx->move_start_y;
ctx->move_start_x = xpos;
ctx->move_start_y = ypos;
ctx->mouse_event = true;
}
}
static void resize_buffers(
CUdeviceptr *accum_buffer_cuda,
CUgraphicsResource *display_buffer_cuda, int width, int height, GLuint display_buffer)
{
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, display_buffer);
glBufferData(GL_PIXEL_UNPACK_BUFFER, width * height * 4, nullptr, GL_DYNAMIC_COPY);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
check_success(glGetError() == GL_NO_ERROR);
if (*display_buffer_cuda)
check_cuda_success(cuGraphicsUnregisterResource(*display_buffer_cuda));
check_cuda_success(
cuGraphicsGLRegisterBuffer(
display_buffer_cuda, display_buffer, CU_GRAPHICS_REGISTER_FLAGS_WRITE_DISCARD));
if (*accum_buffer_cuda)
check_cuda_success(cuMemFree(*accum_buffer_cuda));
check_cuda_success(cuMemAlloc(accum_buffer_cuda, width * height * sizeof(float3)));
}
static float build_alias_map(
const float *data,
const unsigned int size,
Env_accel *accel)
{
float sum = 0.0f;
for (unsigned int i = 0; i < size; ++i)
sum += data[i];
for (unsigned int i = 0; i < size; ++i)
accel[i].q = (static_cast<float>(size) * data[i] / sum);
unsigned int *partition_table = static_cast<unsigned int *>(
malloc(size *
sizeof(
unsigned int)));
unsigned int s = 0u, large = size;
for (unsigned int i = 0; i < size; ++i)
partition_table[(accel[i].q < 1.0f) ? (s++) : (--large)] = accel[i].alias = i;
for (s = 0; s < large && large < size; ++s)
{
const unsigned int j = partition_table[s], k = partition_table[large];
accel[j].alias = k;
accel[k].q += accel[j].q - 1.0f;
large = (accel[k].q < 1.0f) ? (large + 1u) : large;
}
return sum;
}
static void create_environment(
cudaTextureObject_t *env_tex,
cudaArray_t *env_tex_data,
CUdeviceptr *env_accel,
uint2 *res,
const char *envmap_name)
{
check_success(image->reset_file(envmap_name) == 0);
res->x = rx;
res->y = ry;
char const *image_type = image->get_type();
if (
strcmp(image_type,
"Color") != 0 &&
strcmp(image_type,
"Float32<4>") != 0)
canvas = image_api->
convert(canvas.get(),
"Color");
const cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
check_cuda_success(cudaMallocArray(env_tex_data, &channel_desc, rx, ry));
const float *pixels = static_cast<const float *>(tile->get_data());
check_cuda_success(cudaMemcpyToArray(
*env_tex_data, 0, 0, pixels,
rx * ry * sizeof(float4), cudaMemcpyHostToDevice));
cudaResourceDesc res_desc;
memset(&res_desc, 0,
sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = *env_tex_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0,
sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeWrap;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeWrap;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
check_cuda_success(cudaCreateTextureObject(env_tex, &res_desc, &tex_desc, nullptr));
Env_accel *env_accel_host =
static_cast<Env_accel *
>(
malloc(rx * ry *
sizeof(Env_accel)));
float *importance_data =
static_cast<float *
>(
malloc(rx * ry *
sizeof(
float)));
float cos_theta0 = 1.0f;
const float step_phi = float(2.0 * M_PI) / float(rx);
const float step_theta = float(M_PI) / float(ry);
for (unsigned int y = 0; y < ry; ++y)
{
const float theta1 = float(y + 1) * step_theta;
const float cos_theta1 =
std::cos(theta1);
const float area = (cos_theta0 - cos_theta1) * step_phi;
cos_theta0 = cos_theta1;
for (unsigned int x = 0; x < rx; ++x) {
const unsigned int idx = y * rx + x;
const unsigned int idx4 = idx * 4;
importance_data[idx] =
}
}
const float inv_env_integral = 1.0f / build_alias_map(importance_data, rx * ry, env_accel_host);
for (unsigned int i = 0; i < rx * ry; ++i) {
const unsigned int idx4 = i * 4;
env_accel_host[i].pdf =
std::max(pixels[idx4],
std::max(pixels[idx4 + 1], pixels[idx4 + 2])) * inv_env_integral;
}
*env_accel = gpu_mem_dup(env_accel_host, rx * ry * sizeof(Env_accel));
}
static void save_result(
const CUdeviceptr accum_buffer,
const unsigned int width,
const unsigned int height,
{
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(data, accum_buffer, width * height * sizeof(float3)));
}
struct Options {
float gui_scale;
bool opengl;
bool use_class_compilation;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter;
#endif
bool no_aa;
bool enable_derivatives;
unsigned int res_x, res_y;
unsigned int iterations;
unsigned int samples_per_iteration;
unsigned int mdl_test_type;
unsigned int max_path_length;
float fov;
float exposure;
float3 cam_pos;
float3 light_pos;
float3 light_intensity;
Options()
: gui_scale(1.0f)
, opengl(true)
, use_class_compilation(true)
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
, use_df_interpreter(false)
#endif
, no_aa(false)
, enable_derivatives(false)
, res_x(1024)
, res_y(1024)
, iterations(4096)
, samples_per_iteration(8)
, mdl_test_type(MDL_TEST_MIS)
, max_path_length(4)
, fov(96.0f)
, exposure(0.0f)
, cam_pos(make_float3(0, 0, 3))
, light_pos(make_float3(0, 0, 0))
, light_intensity(make_float3(0, 0, 0))
, hdrfile("nvidia/sdk_examples/resources/environment.hdr")
, outputfile("output.exr")
, material_names()
, mdl_paths()
{}
};
struct Enum_value {
int value;
: name(name), value(value)
{
}
};
struct Enum_type_info {
}
};
class Param_info
{
public:
enum Param_kind
{
PK_UNKNOWN,
PK_FLOAT,
PK_FLOAT2,
PK_FLOAT3,
PK_COLOR,
PK_BOOL,
PK_ENUM,
PK_STRING,
PK_TEXTURE,
PK_LIGHT_PROFILE,
PK_BSDF_MEASUREMENT
};
Param_info(
char const *name,
char const *display_name,
char const *group_name,
Param_kind kind,
char *data_ptr,
const Enum_type_info *enum_info = nullptr)
: m_index(index)
, m_name(name)
, m_display_name(display_name)
, m_group_name(group_name)
, m_kind(kind)
, m_data_ptr(data_ptr)
, m_range_min(-100), m_range_max(100)
, m_enum_info(enum_info)
{
}
template<typename T>
T &data() { return *reinterpret_cast<T *>(m_data_ptr); }
template<typename T>
const T &data() const { return *reinterpret_cast<const T *>(m_data_ptr); }
const char * &display_name() { return m_display_name; }
const char *display_name() const { return m_display_name; }
const char * &group_name() { return m_group_name; }
const char *group_name() const { return m_group_name; }
Param_kind kind() const { return m_kind; }
float &range_min() { return m_range_min; }
float range_min() const { return m_range_min; }
float &range_max() { return m_range_max; }
float range_max() const { return m_range_max; }
const Enum_type_info *enum_info() const { return m_enum_info; }
private:
char const *m_name;
char const *m_display_name;
char const *m_group_name;
Param_kind m_kind;
char *m_data_ptr;
float m_range_min, m_range_max;
const Enum_type_info *m_enum_info;
};
class Material_info
{
public:
Material_info(char const *name)
: m_name(name)
{}
void add_sorted_by_group(
const Param_info &
info) {
bool group_found = false;
if (info.group_name() != nullptr) {
{
const bool same_group =
it->group_name() !=
nullptr &&
strcmp(it->group_name(), info.group_name()) == 0;
if (group_found && !same_group) {
m_params.insert(it, info);
return;
}
if (same_group)
group_found = true;
}
}
m_params.push_back(info);
}
enum_types[name] = enum_info;
}
const Enum_type_info *get_enum_type(
const std::string name) {
Enum_type_map::const_iterator it = enum_types.find(name);
if (it != enum_types.end())
return nullptr;
}
char const *name() const { return m_name; }
private:
char const *m_name;
Enum_type_map enum_types;
};
class Resource_table
{
public:
enum Kind {
RESOURCE_TEXTURE,
RESOURCE_LIGHT_PROFILE,
RESOURCE_BSDF_MEASUREMENT
};
Resource_table(
Kind kind)
: m_max_len(0u)
{
read_resources(target_code, transaction, kind);
}
size_t get_max_length() const { return m_max_len; }
private:
void read_resources(
Kind kind)
{
switch (kind) {
case RESOURCE_TEXTURE:
char const *url = nullptr;
if (char const *img = tex->get_image()) {
url = image->get_filename();
}
if (url == nullptr)
url = s;
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_LIGHT_PROFILE:
char const *url = lp->get_filename();
if (url == nullptr)
url = s;
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
case RESOURCE_BSDF_MEASUREMENT:
char const *url = bm->get_filename();
if (url == nullptr)
url = s;
if (l > m_max_len)
m_max_len = l;
m_resource_map[s] = (unsigned)i;
m_urls.push_back(url);
}
break;
}
}
private:
Resource_id_map m_resource_map;
size_t m_max_len;
};
class String_constant_table
{
public:
{
get_all_strings(target_code);
}
unsigned get_id_for_string(const char *name) {
String_map::const_iterator it(m_string_constants_map.find(name));
if (it != m_string_constants_map.end())
return it->second;
unsigned n_id = unsigned(m_string_constants_map.size() + 1);
m_string_constants_map[name] = n_id;
m_strings.reserve((n_id + 63) & ~63);
m_strings.push_back(name);
if (l > m_max_len)
m_max_len = l;
return n_id;
}
size_t get_max_length() const { return m_max_len; }
const char *get_string(unsigned id) {
if (id == 0 || id - 1 >= m_strings.size())
return nullptr;
return m_strings[id - 1].c_str();
}
private:
void get_all_strings(
{
m_max_len = 0;
if (l > m_max_len)
m_max_len = l;
m_string_constants_map[s] = (unsigned)i;
m_strings.push_back(s);
}
}
private:
String_map m_string_constants_map;
size_t m_max_len;
};
static void update_camera(
Kernel_params &kernel_params,
double phi,
double theta,
float base_dist,
int zoom)
{
kernel_params.cam_dir.x = float(-
sin(phi) *
sin(theta));
kernel_params.cam_dir.y = float(-
cos(theta));
kernel_params.cam_dir.z = float(-
cos(phi) *
sin(theta));
kernel_params.cam_right.x = float(
cos(phi));
kernel_params.cam_right.y = 0.0f;
kernel_params.cam_right.z = float(-
sin(phi));
kernel_params.cam_up.x = float(-
sin(phi) *
cos(theta));
kernel_params.cam_up.y = float(
sin(theta));
kernel_params.cam_up.z = float(-
cos(phi) *
cos(theta));
const float dist = float(base_dist *
pow(0.95,
double(zoom)));
kernel_params.cam_pos.x = -kernel_params.cam_dir.x * dist;
kernel_params.cam_pos.y = -kernel_params.cam_dir.y * dist;
kernel_params.cam_pos.z = -kernel_params.cam_dir.z * dist;
}
static bool handle_resource(
Param_info ¶m,
Resource_table const &res_table)
{
bool changed = false;
int id = param.data<int>();
if (ImGui::BeginCombo(param.display_name(), cur_url.c_str())) {
for (
size_t i = 0, n = urls.
size(); i < n; ++i) {
bool is_selected = (cur_url == name);
if (ImGui::Selectable(name.
c_str(), is_selected)) {
param.data<int>() = int(i);
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
return changed;
}
static void render_scene(
const Options &options,
{
Window_context window_context;
memset(&window_context, 0,
sizeof(Window_context));
GLuint display_buffer = 0;
GLuint display_tex = 0;
GLuint program = 0;
GLuint quad_vertex_buffer = 0;
GLuint quad_vao = 0;
GLFWwindow *window = nullptr;
int width = -1;
int height = -1;
if (options.opengl) {
window = init_opengl();
glfwSetWindowUserPointer(window, &window_context);
glfwSetKeyCallback(window, handle_key);
glfwSetScrollCallback(window, handle_scroll);
glfwSetCursorPosCallback(window, handle_mouse_pos);
glfwSetMouseButtonCallback(window, handle_mouse_button);
glfwSetCharCallback(window, ImGui_ImplGlfwGL3_CharCallback);
ImGui_ImplGlfwGL3_Init(window, false);
ImGui::GetIO().IniFilename = nullptr;
ImGui::GetStyle().ScaleAllSizes(options.gui_scale);
glGenBuffers(1, &display_buffer);
glGenTextures(1, &display_tex);
check_success(glGetError() == GL_NO_ERROR);
program = create_shader_program();
quad_vao = create_quad(program, &quad_vertex_buffer);
}
CUcontext cuda_context = init_cuda(options.opengl);
CUdeviceptr accum_buffer = 0;
CUgraphicsResource display_buffer_cuda = nullptr;
if (!options.opengl) {
width = options.res_x;
height = options.res_y;
check_cuda_success(cuMemAlloc(&accum_buffer, width * height * sizeof(float3)));
}
Kernel_params kernel_params;
memset(&kernel_params, 0,
sizeof(Kernel_params));
kernel_params.cam_focal = 1.0f / tanf(options.fov / 2 * float(2 * M_PI / 360));
kernel_params.light_pos = options.light_pos;
kernel_params.light_intensity = options.light_intensity;
kernel_params.iteration_start = 0;
kernel_params.iteration_num = options.samples_per_iteration;
kernel_params.mdl_test_type = options.mdl_test_type;
kernel_params.max_path_length = options.max_path_length;
kernel_params.exposure_scale = powf(2.0f, options.exposure);
kernel_params.disable_aa = options.no_aa;
kernel_params.use_derivatives = options.enable_derivatives;
float base_dist =
length(options.cam_pos);
double theta, phi;
{
const float3 inv_dir = normalize(options.cam_pos);
phi =
atan2(inv_dir.x, inv_dir.z);
}
update_camera(kernel_params, phi, theta, base_dist, window_context.zoom);
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_df_cuda_derivatives.ptx" : "example_df_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(get_executable_folder() + ptx_name).c_str(),
"render_sphere_kernel",
&cuda_function);
CUdeviceptr material_buffer = 0;
check_cuda_success(cuMemAlloc(&material_buffer,
material_bundle.
size() *
sizeof(Df_cuda_material)));
check_cuda_success(cuMemcpyHtoD(material_buffer, material_bundle.
data(),
material_bundle.
size() *
sizeof(Df_cuda_material)));
kernel_params.material_buffer = reinterpret_cast<Df_cuda_material*>(material_buffer);
CUdeviceptr env_accel;
cudaArray_t env_tex_data;
create_environment(
&kernel_params.env_tex, &env_tex_data, &env_accel, &kernel_params.env_size, transaction,
image_api, options.hdrfile.c_str());
kernel_params.env_accel = reinterpret_cast<Env_accel *>(env_accel);
if (options.material_names.size() > 1) {
size_t dot_pos = options.outputfile.rfind(".");
if (dot_pos == std::string::npos)
filename_base = options.outputfile;
else {
filename_base = options.outputfile.substr(0, dot_pos);
filename_ext = options.outputfile.substr(dot_pos);
}
next_filename = filename_base + "-0" + filename_ext;
} else
next_filename = options.outputfile;
{
Material_gpu_context material_gpu_context(options.enable_derivatives);
if (!material_gpu_context.prepare_target_code_data(
transaction.
get(), image_api.
get(), target_code.
get()))
kernel_params.tc_data = reinterpret_cast<Target_code_data *>(
material_gpu_context.get_device_target_code_data_list());
kernel_params.arg_block_list = reinterpret_cast<char const **>(
material_gpu_context.get_device_target_argument_block_list());
String_constant_table constant_table(target_code);
Resource_table texture_table(target_code, transaction, Resource_table::RESOURCE_TEXTURE);
Resource_table lp_table(target_code, transaction, Resource_table::RESOURCE_LIGHT_PROFILE);
Resource_table bm_table(
target_code, transaction, Resource_table::RESOURCE_BSDF_MEASUREMENT);
for (size_t i = 0, num_mats = compiled_materials.size(); i < num_mats; ++i) {
size_t arg_block_index = material_gpu_context.get_bsdf_argument_block_index(i);
material_gpu_context.get_argument_block_layout(arg_block_index));
material_gpu_context.get_argument_block(arg_block_index));
char *arg_block_data = arg_block != nullptr ? arg_block->get_data() : nullptr;
if (name == nullptr) continue;
Param_info::Param_kind param_kind = Param_info::PK_UNKNOWN;
const Enum_type_info *enum_type = nullptr;
switch (kind) {
param_kind = Param_info::PK_FLOAT;
break;
param_kind = Param_info::PK_COLOR;
break;
param_kind = Param_info::PK_BOOL;
break;
{
val->get_type());
val_type->get_element_type());
switch (val_type->get_size()) {
case 2: param_kind = Param_info::PK_FLOAT2; break;
case 3: param_kind = Param_info::PK_FLOAT3; break;
}
}
}
break;
{
val->get_type());
const Enum_type_info *info = mat_info.get_enum_type(val_type->get_symbol());
if (info == nullptr) {
for (
mi::Size i = 0, n = val_type->get_size(); i < n; ++i) {
p->add(val_type->get_value_name(i), val_type->get_value_code(i));
}
mat_info.add_enum_type(val_type->get_symbol(), p);
}
param_kind = Param_info::PK_ENUM;
}
break;
param_kind = Param_info::PK_STRING;
break;
param_kind = Param_info::PK_TEXTURE;
break;
param_kind = Param_info::PK_LIGHT_PROFILE;
break;
param_kind = Param_info::PK_BSDF_MEASUREMENT;
break;
default:
continue;
}
mi::Size offset = layout->get_layout(kind2, param_size, state);
check_success(kind == kind2);
Param_info param_info(
j, name, name, nullptr, param_kind, arg_block_data + offset,
enum_type);
anno_list->get_annotation_block(name));
if (anno_block) {
mi::neuraylib::Annotation_wrapper annos(anno_block.get());
annos.get_annotation_index("::anno::hard_range(float,float)");
annos.get_annotation_param_value(anno_index, 0, param_info.range_min());
annos.get_annotation_param_value(anno_index, 1, param_info.range_max());
}
anno_index = annos.get_annotation_index("::anno::display_name(string)");
annos.get_annotation_param_value(anno_index, 0, param_info.display_name());
}
anno_index = annos.get_annotation_index("::anno::in_group(string)");
annos.get_annotation_param_value(anno_index, 0, param_info.group_name());
}
}
mat_info.add_sorted_by_group(param_info);
}
}
while (true)
{
double start_time = 0.0;
if (!options.opengl)
{
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
if (kernel_params.iteration_start >= options.iterations) {
save_result(
accum_buffer, width, height, next_filename, image_api, mdl_compiler);
if (kernel_params.current_material + 1 >= material_bundle.
size())
break;
kernel_params.iteration_start = 0;
++kernel_params.current_material;
next_filename = filename_base +
"-" +
to_string(kernel_params.current_material)
+ filename_ext;
}
<< "rendering iterations " << kernel_params.iteration_start << " to "
<< kernel_params.iteration_start + kernel_params.iteration_num <<
std::endl;
}
else
{
if (glfwWindowShouldClose(window))
break;
glfwPollEvents();
ImGui_ImplGlfwGL3_NewFrame();
int nwidth, nheight;
glfwGetFramebufferSize(window, &nwidth, &nheight);
if (nwidth != width || nheight != height)
{
width = nwidth;
height = nheight;
resize_buffers(
&accum_buffer, &display_buffer_cuda, width, height, display_buffer);
kernel_params.accum_buffer = reinterpret_cast<float3 *>(accum_buffer);
glViewport(0, 0, width, height);
kernel_params.resolution.x = width;
kernel_params.resolution.y = height;
kernel_params.iteration_start = 0;
}
ImGui::SetNextWindowPos(ImVec2(0, 0), ImGuiCond_FirstUseEver);
ImGui::SetNextWindowSize(
ImVec2(360 * options.gui_scale, 350 * options.gui_scale),
ImGuiCond_FirstUseEver);
ImGui::Begin("Material parameters");
ImGui::SetWindowFontScale(options.gui_scale);
ImGui::PushItemWidth(-200 * options.gui_scale);
if (options.use_class_compilation)
ImGui::Text("CTRL + Click to manually enter numbers");
else
ImGui::Text("Parameter editing requires class compilation.");
Material_info &mat_info = mat_infos[
material_bundle[kernel_params.current_material].compiled_material_index];
ImGui::Text("%s", mat_info.name());
bool changed = false;
const char *group_name = nullptr;
int id = 0;
end = mat_info.params().end(); it !=
end; ++it, ++id)
{
Param_info ¶m = *it;
ImGui::PushID(id);
if ((!param.group_name() != !group_name) ||
(param.group_name() &&
(!group_name ||
strcmp(group_name, param.group_name()) != 0)))
{
ImGui::Separator();
if (param.group_name() != nullptr)
ImGui::Text("%s", param.group_name());
group_name = param.group_name();
}
switch (param.kind()) {
case Param_info::PK_FLOAT:
changed |= ImGui::SliderFloat(
param.display_name(),
¶m.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT2:
changed |= ImGui::SliderFloat2(
param.display_name(),
¶m.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_FLOAT3:
changed |= ImGui::SliderFloat3(
param.display_name(),
¶m.data<float>(),
param.range_min(),
param.range_max());
break;
case Param_info::PK_COLOR:
changed |= ImGui::ColorEdit3(
param.display_name(),
¶m.data<float>());
break;
case Param_info::PK_BOOL:
changed |= ImGui::Checkbox(
param.display_name(),
¶m.data<bool>());
break;
case Param_info::PK_ENUM:
{
int value = param.data<int>();
const Enum_type_info *info = param.enum_info();
for (size_t i = 0, n = info->values.size(); i < n; ++i) {
if (info->values[i].value == value) {
curr_value = info->values[i].name;
break;
}
}
if (ImGui::BeginCombo(param.display_name(), curr_value.c_str())) {
for (size_t i = 0, n = info->values.size(); i < n; ++i) {
bool is_selected = (curr_value == name);
if (ImGui::Selectable(
info->values[i].name.c_str(), is_selected)) {
param.data<int>() = info->values[i].value;
changed = true;
}
if (is_selected)
ImGui::SetItemDefaultFocus();
}
ImGui::EndCombo();
}
}
break;
case Param_info::PK_STRING:
{
size_t max_len = constant_table.get_max_length();
max_len = max_len > 63 ? max_len + 1 : 64;
unsigned curr_index = param.data<unsigned>();
const char *opt = constant_table.get_string(curr_index);
if (ImGui::InputText(
param.display_name(),
ImGuiInputTextFlags_EnterReturnsTrue))
{
unsigned id = constant_table.get_id_for_string(buf.
data());
param.data<unsigned>() = id;
changed = true;
}
}
break;
case Param_info::PK_TEXTURE:
changed |= handle_resource(param, texture_table);
break;
case Param_info::PK_LIGHT_PROFILE:
changed |= handle_resource(param, lp_table);
break;
case Param_info::PK_BSDF_MEASUREMENT:
changed |= handle_resource(param, bm_table);
break;
case Param_info::PK_UNKNOWN:
break;
}
ImGui::PopID();
}
if (options.enable_derivatives) {
ImGui::Separator();
bool b = kernel_params.use_derivatives != 0;
if (ImGui::Checkbox("Use derivatives", &b)) {
kernel_params.iteration_start = 0;
kernel_params.use_derivatives = b;
}
}
ImGui::PopItemWidth();
ImGui::End();
if (changed) {
material_gpu_context.update_device_argument_block(
material_bundle[kernel_params.current_material].compiled_material_index);
kernel_params.iteration_start = 0;
}
start_time = glfwGetTime();
Window_context *ctx =
static_cast<Window_context*>(glfwGetWindowUserPointer(window));
if (ctx->save_result && !ImGui::GetIO().WantCaptureKeyboard) {
save_result(
accum_buffer, width, height, options.outputfile, image_api, mdl_compiler);
}
if (ctx->exposure_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.exposure_scale = powf(2.0f, ctx->exposure);
}
if (ctx->key_event && !ImGui::GetIO().WantCaptureKeyboard) {
kernel_params.iteration_start = 0;
const unsigned num_materials = unsigned(material_bundle.size());
kernel_params.current_material = (kernel_params.current_material +
ctx->material_index_delta + num_materials) % num_materials;
ctx->material_index_delta = 0;
}
if (ctx->mouse_button - 1 == GLFW_MOUSE_BUTTON_LEFT) {
if (ctx->mouse_button_action == GLFW_PRESS &&
!ImGui::GetIO().WantCaptureMouse) {
ctx->moving = true;
glfwGetCursorPos(window, &ctx->move_start_x, &ctx->move_start_y);
}
else
ctx->moving = false;
}
if (ctx->mouse_wheel_delta && !ImGui::GetIO().WantCaptureMouse) {
ctx->zoom += ctx->mouse_wheel_delta;
}
if (ctx->mouse_event && !ImGui::GetIO().WantCaptureMouse) {
kernel_params.iteration_start = 0;
phi -= ctx->move_dx * 0.001 * M_PI;
theta -= ctx->move_dy * 0.001 * M_PI;
ctx->move_dx = ctx->move_dy = 0.0;
update_camera(kernel_params, phi, theta, base_dist, ctx->zoom);
}
ctx->save_result = false;
ctx->key_event = false;
ctx->mouse_event = false;
ctx->exposure_event = false;
ctx->mouse_wheel_delta = 0;
ctx->mouse_button = 0;
check_cuda_success(cuGraphicsMapResources(1, &display_buffer_cuda, 0));
CUdeviceptr p;
size_t size_p;
check_cuda_success(
cuGraphicsResourceGetMappedPointer(&p, &size_p, display_buffer_cuda));
kernel_params.display_buffer = reinterpret_cast<unsigned int *>(p);
}
dim3 threads_per_block(16, 16);
dim3 num_blocks((width + 15) / 16, (height + 15) / 16);
void *params[] = { &kernel_params };
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, params, nullptr));
kernel_params.iteration_start += kernel_params.iteration_num;
check_cuda_success(cuStreamSynchronize(0));
if (options.opengl)
{
check_cuda_success(cuGraphicsUnmapResources(1, &display_buffer_cuda, 0));
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, display_buffer);
glBindTexture(GL_TEXTURE_2D, display_tex);
glTexImage2D(
GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_BGRA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
check_success(glGetError() == GL_NO_ERROR);
glClear(GL_COLOR_BUFFER_BIT);
glBindVertexArray(quad_vao);
glDrawArrays(GL_TRIANGLES, 0, 6);
check_success(glGetError() == GL_NO_ERROR);
ImGui::Render();
glfwSwapBuffers(window);
const double fps =
double(kernel_params.iteration_num) / (glfwGetTime() - start_time);
glfwSetWindowTitle(
" (iterations/s: " +
to_string(fps) +
")").c_str());
}
}
}
check_cuda_success(cudaDestroyTextureObject(kernel_params.env_tex));
check_cuda_success(cudaFreeArray(env_tex_data));
check_cuda_success(cuMemFree(env_accel));
check_cuda_success(cuMemFree(accum_buffer));
check_cuda_success(cuMemFree(material_buffer));
check_cuda_success(cuModuleUnload(cuda_module));
uninit_cuda(cuda_context);
if (options.opengl) {
glDeleteVertexArrays(1, &quad_vao);
glDeleteBuffers(1, &quad_vertex_buffer);
glDeleteProgram(program);
check_success(glGetError() == GL_NO_ERROR);
ImGui_ImplGlfwGL3_Shutdown();
glfwDestroyWindow(window);
glfwTerminate();
}
}
{
}
Df_cuda_material create_cuda_material(
size_t target_code_index,
size_t compiled_material_index,
{
Df_cuda_material mat;
mat.compiled_material_index = static_cast<unsigned int>(compiled_material_index);
mat.argument_block_index = static_cast<unsigned int>(descs[0].argument_block_index);
mat.bsdf.x = static_cast<unsigned int>(target_code_index);
mat.bsdf.y = static_cast<unsigned int>(descs[0].function_index);
mat.edf.x = static_cast<unsigned int>(target_code_index);
mat.edf.y = static_cast<unsigned int>(descs[1].function_index);
mat.emission_intensity.x = static_cast<unsigned int>(target_code_index);
mat.emission_intensity.y = static_cast<unsigned int>(descs[2].function_index);
mat.volume_absorption.x = static_cast<unsigned int>(target_code_index);
mat.volume_absorption.y = static_cast<unsigned int>(descs[3].function_index);
return mat;
}
static void usage(const char *name)
{
<< "usage: " << name << " [options] [<material_name1> ...]\n"
<< "-h print this text\n"
<< "--nogl don't open interactive display\n"
<< "--nocc don't use class-compilation\n"
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
<< "--interpreter use the DF interpreter (if possible)\n"
#endif
<< "--gui_scale <factor> GUI scaling factor (default: 1.0)\n"
<< "--res <res_x> <res_y> resolution (default: 1024x1024)\n"
<< "--hdr <filename> HDR environment map "
"(default: nvidia/sdk_examples/resources/environment.hdr)\n"
<< "-o <outputfile> image file to write result to (default: output.exr).\n"
<< " With multiple materials \"-<material index>\" will be\n"
<< " added in front of the extension\n"
<< "--spp <num> samples per pixel, only active for --nogl (default: 4096)\n"
<< "--spi <num> samples per render call (default: 8)\n"
<< "-t <type> 0: eval, 1: sample, 2: mis, 3: mis + pdf, 4: no env\n"
<< " (default: 2)\n"
<< "-e <exposure> exposure for interactive display (default: 0.0)\n"
<< "-f <fov> the camera field of view in degree (default: 96.0)\n"
<< "-p <x> <y> <z> set the camera position (default 0 0 3).\n"
<< " The camera will always look towards (0, 0, 0).\n"
<< "-l <x> <y> <z> <r> <g> <b> add an isotropic point light with given coordinates and "
"intensity (flux)\n"
<< "--mdl_path <path> MDL search path, can occur multiple times.\n"
<< "--max_path_length <num> maximum path length, default 4 (up to one total internal\n"
<< " reflection), clamped to 2..100\n"
<< "--noaa disable pixel oversampling\n"
<< "-d enable use of derivatives\n"
<< "\n"
<< "Note: material names can end with an '*' as a wildcard\n";
}
int main(int argc, char* argv[])
{
Options options;
for (int i = 1; i < argc; ++i) {
const char *opt = argv[i];
if (opt[0] == '-') {
if (
strcmp(opt,
"--nogl") == 0) {
options.opengl = false;
}
else if (
strcmp(opt,
"--nocc") == 0) {
options.use_class_compilation = false;
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
}
else if (
strcmp(opt,
"--interpreter") == 0) {
options.use_df_interpreter = true;
#endif
}
else if (
strcmp(opt,
"--gui_scale") == 0 && i < argc - 1) {
options.gui_scale =
static_cast<float>(
atof(argv[++i]));
}
else if (
strcmp(opt,
"--res") == 0 && i < argc - 2) {
}
else if (
strcmp(opt,
"--hdr") == 0 && i < argc - 1) {
options.hdrfile = argv[++i];
}
else if (
strcmp(opt,
"-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
}
else if (
strcmp(opt,
"--spp") == 0 && i < argc - 1) {
}
else if (
strcmp(opt,
"--spi") == 0 && i < argc - 1) {
options.samples_per_iteration =
std::max(
atoi(argv[++i]), 1);
}
else if (
strcmp(opt,
"-t") == 0 && i < argc - 1) {
const int type =
atoi(argv[++i]);
if (type < 0 || type >= MDL_TEST_COUNT) {
usage(argv[0]);
}
options.mdl_test_type = type;
}
else if (
strcmp(opt,
"-e") == 0 && i < argc - 1) {
options.exposure =
static_cast<float>(
atof(argv[++i]));
}
else if (
strcmp(opt,
"-f") == 0 && i < argc - 1) {
options.fov =
static_cast<float>(
atof(argv[++i]));
}
else if (
strcmp(opt,
"-p") == 0 && i < argc - 3) {
options.cam_pos.x =
static_cast<float>(
atof(argv[++i]));
options.cam_pos.y =
static_cast<float>(
atof(argv[++i]));
options.cam_pos.z =
static_cast<float>(
atof(argv[++i]));
}
else if (
strcmp(opt,
"-l") == 0 && i < argc - 6) {
options.light_pos.x =
static_cast<float>(
atof(argv[++i]));
options.light_pos.y =
static_cast<float>(
atof(argv[++i]));
options.light_pos.z =
static_cast<float>(
atof(argv[++i]));
options.light_intensity.x =
static_cast<float>(
atof(argv[++i]));
options.light_intensity.y =
static_cast<float>(
atof(argv[++i]));
options.light_intensity.z =
static_cast<float>(
atof(argv[++i]));
}
else if (
strcmp(opt,
"--mdl_path") == 0 && i < argc - 1) {
options.mdl_paths.push_back(argv[++i]);
}
else if (
strcmp(opt,
"--max_path_length") == 0 && i < argc - 1) {
}
else if (
strcmp(opt,
"--noaa") == 0) {
options.no_aa = true;
}
else if (
strcmp(opt,
"-d") == 0) {
options.enable_derivatives = true;
} else {
usage(argv[0]);
}
}
else
}
check_success(neuray.is_valid_interface());
for (
std::size_t i = 0; i < options.mdl_paths.size(); ++i)
check_success(mdl_compiler->
add_module_path(options.mdl_paths[i].c_str()) == 0);
if (options.material_names.empty())
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_df");
check_start_success(result);
{
{
Material_compiler mc(
mdl_factory.get(),
16,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
options.use_df_interpreter,
#endif
options.enable_derivatives);
mi::neuraylib::Target_function_description("surface.scattering"));
mi::neuraylib::Target_function_description("surface.emission.emission"));
mi::neuraylib::Target_function_description("surface.emission.intensity"));
mi::neuraylib::Target_function_description("volume.absorption_coefficient"));
for (size_t i = 0; i < options.material_names.size(); ++i) {
if (!starts_with(material_name, "::")) material_name = "::" + material_name;
if (material_name.size() > 1 && material_name.back() == '*') {
mc.get_module_name(material_name)));
for (size_t j = 0, n = module_materials.size(); j < n; ++j) {
material_name = module_materials[j];
if (starts_with(material_name, "mdl::"))
material_name = material_name.substr(3);
if (!starts_with(material_name, pattern))
continue;
check_success(mc.add_material(
material_name,
options.use_class_compilation));
material_bundle.push_back(create_cuda_material(
0, material_bundle.size(), descs));
used_material_names.
push_back(material_name);
}
} else {
check_success(mc.add_material(
material_name,
options.use_class_compilation));
material_bundle.push_back(create_cuda_material(
0, material_bundle.size(), descs));
used_material_names.
push_back(material_name);
}
}
options.material_names = used_material_names;
mc.generate_cuda_ptx());
render_scene(
options,
transaction,
image_api,
mdl_compiler,
target_code,
mc.get_material_defs(),
mc.get_compiled_materials(),
material_bundle);
}
}
mdl_compiler = 0;
check_success(neuray->shutdown() == 0);
neuray = 0;
check_success(unload());
keep_console_open();
return EXIT_SUCCESS;
}
Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.h
#ifndef EXAMPLE_DF_CUDA_H
#define EXAMPLE_DF_CUDA_H
#include <vector_types.h>
#include <texture_types.h>
struct Target_code_data;
enum Mdl_test_type {
MDL_TEST_EVAL = 0,
MDL_TEST_SAMPLE = 1,
MDL_TEST_MIS = 2,
MDL_TEST_MIS_PDF = 3,
MDL_TEST_NO_ENV = 4,
MDL_TEST_COUNT
};
struct Env_accel {
unsigned int alias;
float q;
float pdf;
};
namespace
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
inline uint2 make_invalid()
{
uint2 index_pair;
index_pair.x = ~0;
index_pair.y = ~0;
return index_pair;
}
}
struct Df_cuda_material
{
#if defined(__CUDA_ARCH__)
__host__ __device__
#endif
Df_cuda_material()
: compiled_material_index(0)
, argument_block_index(0)
, bsdf(make_invalid())
, edf(make_invalid())
, emission_intensity(make_invalid())
, volume_absorption(make_invalid())
{
}
unsigned int compiled_material_index;
unsigned int argument_block_index;
uint2 bsdf;
uint2 edf;
uint2 emission_intensity;
uint2 volume_absorption;
};
struct Kernel_params {
uint2 resolution;
float exposure_scale;
unsigned int *display_buffer;
float3 *accum_buffer;
unsigned int iteration_start;
unsigned int iteration_num;
unsigned int mdl_test_type;
unsigned int max_path_length;
unsigned int use_derivatives;
unsigned int disable_aa;
float3 cam_pos;
float3 cam_dir;
float3 cam_right;
float3 cam_up;
float cam_focal;
uint2 env_size;
cudaTextureObject_t env_tex;
Env_accel *env_accel;
float3 light_pos;
float3 light_intensity;
Target_code_data *tc_data;
char const **arg_block_list;
unsigned int current_material;
Df_cuda_material *material_buffer;
};
#endif // EXAMPLE_DF_CUDA_H
Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
struct Texture
{
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Texture_handler : Texture_handler_base {
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
__device__ inline void store_result1(float* res, float3 const& v)
{
*res = 0.212671 * v.x + 0.715160 * v.y + 0.072169 * v.z;
}
__device__ inline void store_result1(float* res, float v0, float v1, float v2)
{
*res = 0.212671 * v0 + 0.715160 * v1 + 0.072169 * v2;
}
__device__ inline void store_result1(float* res, float s)
{
*res = s;
}
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
\
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const [2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const [2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[resource_idx - 1];
const float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x / float(lp.angular_resolution.x - 1);
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
const float v = phi * lp.theta_phi_inv_delta.y / float(lp.angular_resolution.y - 1);
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
extern "C" __device__ void df_light_profile_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const xi[3])
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_lightprofiles)
return;
const Lightprofile& lp = self->lightprofiles[resource_idx - 1];
uint2 res = lp.angular_resolution;
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data;
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0);
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1)
+ (idx_theta * (res.y - 1));
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1);
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[resource_idx - 1];
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
if (idx_theta < 0 || idx_theta > (res.x - 2) || idx_phi < 0 || idx_phi >(res.x - 2))
return 0.0f;
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1)
+ (idx_theta * (res.y - 1));
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 =
cos(start.x +
float(idx_theta) * delta.x);
const float cos_theta_1 =
cos(start.x +
float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[resource_idx - 1];
const unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
result[0] = bm.angular_resolution[part_index].x;
result[1] = bm.angular_resolution[part_index].y;
result[2] = bm.num_channels[part_index];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= M_ONE_OVER_PI;
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
{
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
const unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
{
store_result3(result, 0.0f);
return;
}
if (bm.num_channels[part_index] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi_out[2],
float const xi[3],
Mbsdf_part part)
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
return;
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return;
uint2 res = bm.angular_resolution[part_index];
const float* sample_data = bm.sample_data[part_index];
unsigned idx_theta_in = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
idx_theta_in =
min(idx_theta_in, res.x - 1);
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_in * res.x;
unsigned idx_theta_out = sample_cdf(cdf_theta, res.x, xi0);
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_in * res.x + idx_theta_out) * res.y;
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi_out = sample_cdf(cdf_phi, res.y, xi1);
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi_out) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1];
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
return 0.0f;
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return 0.0f;
const float* sample_data = bm.sample_data[part_index];
uint2 res = bm.angular_resolution[part_index];
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
unsigned idx_theta_in = unsigned(theta_phi_in[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_theta_out = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_phi_out = unsigned(uvw.x * float(res.y));
idx_theta_in =
min(idx_theta_in, res.x - 1);
idx_theta_out =
min(idx_theta_out, res.x - 1);
idx_phi_out =
min(idx_phi_out, res.y - 1);
const float* cdf_theta = sample_data + idx_theta_in * res.x;
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
}
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_in * res.x + idx_theta_out) * res.y;
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
}
float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2],
Texture_handler const *self,
unsigned resource_idx,
float const theta_phi[2],
Mbsdf_part part)
{
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
const unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return;
const uint2 res = bm.angular_resolution[part_index];
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta =
min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_index][idx_theta];
result[1] = bm.max_albedo[part_index];
}
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
return;
df_bsdf_measurement_albedo(
df_bsdf_measurement_albedo(
}
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos
};
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos
};
#endif // TEXTURE_SUPPORT_CUDA_H
Source Code Location: examples/mdl_sdk/share/example_cuda_shared.h
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include "example_shared.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures;
CUdeviceptr textures;
size_t num_mbsdfs;
CUdeviceptr mbsdfs;
size_t num_lightprofiles;
CUdeviceptr lightprofiles;
CUdeviceptr ro_data_segment;
};
template <typename T>
{
stream << val;
}
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
CUcontext init_cuda(
#ifdef OPENGL_INTEROP
const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
check_cuda_success(cuDeviceGet(&cu_device, 0));
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()),
end(m_cont.end()); it !=
end; ++it) {
T &r = *it;
deleter(r);
}
}
C
const &
operator*()
const {
return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
template<typename T>
{
return gpu_mem_dup(&data[0], data.
size() *
sizeof(T));
}
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
m_target_argument_block_list->push_back(0);
}
bool prepare_target_code_data(
CUdeviceptr get_device_target_code_data_list();
CUdeviceptr get_device_target_argument_block_list();
CUdeviceptr get_device_target_argument_block(size_t i)
{
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
void update_device_argument_block(size_t i);
private:
bool prepare_texture(
bool prepare_mbsdf(
bool prepare_lightprofile(
bool m_enable_derivatives;
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
Resource_container<Target_code_data> m_target_code_data_list;
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
Resource_container<CUdeviceptr> m_target_argument_block_list;
Resource_container<Texture> m_all_textures;
Resource_container<Mbsdf> m_all_mbsdfs;
Resource_container<Lightprofile> m_all_lightprofiles;
Resource_container<cudaArray_t> m_all_texture_arrays;
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
{
check_cuda_success(cudaMemcpyToArray(
device_array,
0,
0,
data,
cudaMemcpyHostToDevice));
}
bool Material_gpu_context::prepare_texture(
{
char const *image_type = image->get_type();
if (image->is_uvtile()) {
return false;
}
return false;
}
if (texture->get_effective_gamma() != 1.0f) {
image_api->
convert(canvas.get(),
"Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma());
canvas = gamma_canvas;
}
else if (
strcmp(image_type,
"Color") != 0 &&
strcmp(image_type,
"Float32<4>") != 0) {
canvas = image_api->
convert(canvas.get(),
"Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0,
sizeof(res_desc));
if (texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube ||
texture_shape == mi::neuraylib::ITarget_code::Texture_shape_3d) {
if (texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube &&
tex_layers != 6) {
std::cerr <<
"Invalid number of layers (" << tex_layers
<<
"), cubemaps must have 6 layers!" <<
std::endl;
return false;
}
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube ?
cudaArrayCubemap : 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0,
sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
for (
mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(©_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
for (
mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.
get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
cudaTextureAddressMode addr_mode =
texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube
? cudaAddressModeClamp
: cudaAddressModeWrap;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0,
sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f;
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
cudaTextureObject_t tex_obj_unfilt = 0;
if (texture_shape != mi::neuraylib::ITarget_code::Texture_shape_cube) {
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.
back());
return true;
}
namespace
{
{
switch (part)
{
break;
break;
}
if (!dataset)
return true;
uint2 res;
mbsdf_cuda_representation.Add(part, res, num_channels);
const unsigned int cdf_theta_size = res.x * res.x;
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x];
float* sample_data_theta = sample_data;
float* sample_data_phi = sample_data + cdf_theta_size;
const float s_theta = (float) (M_PI * 0.5) / float(res.x);
const float s_phi = (float) (M_PI) / float(res.y);
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value =
fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+
fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>()
: cudaCreateChannelDesc<float>());
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0,
sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data),
res.y * lookup_channels * sizeof(float),
res.y,
res.x);
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(©_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0,
sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0,
sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
{
Mbsdf mbsdf_cuda;
return false;
return false;
m_all_mbsdfs->push_back(mbsdfs.
back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
{
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
const float* data = lprof_nr->
get_data();
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpyToArray(device_lightprofile_data, 0, 0, data,
res.x * res.y * sizeof(float), cudaMemcpyHostToDevice));
cudaResourceDesc res_desc;
memset(&res_desc, 0,
sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0,
sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
m_all_lightprofiles->push_back(lightprofiles.
back());
return true;
}
bool Material_gpu_context::prepare_target_code_data(
{
check_success(m_device_target_code_data_list.get() == 0);
CUdeviceptr device_ro_data = 0;
device_ro_data = gpu_mem_dup(
}
CUdeviceptr device_textures = 0;
if (num_textures > 1) {
for (
mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
device_textures = gpu_mem_dup(textures);
}
CUdeviceptr device_mbsdfs = 0;
if (num_mbsdfs > 1)
{
for (
mi::Size i = 1; i < num_mbsdfs; ++i)
{
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
CUdeviceptr device_lightprofiles = 0;
if (num_lightprofiles > 1)
{
for (
mi::Size i = 1; i < num_lightprofiles; ++i)
{
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_arg_block_layouts.push_back(
}
if (kind != mi::neuraylib::ITarget_code::FK_DF_INIT ||
df_kind != mi::neuraylib::ITarget_code::DK_BSDF)
continue;
m_bsdf_arg_block_indices.push_back(
}
return true;
}
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
class Material_compiler {
public:
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives);
bool add_material_subexpr(
const char* path,
const char* fname,
bool class_compilation=false);
bool add_material_df(
const char* path,
const char* base_fname,
bool class_compilation=false);
bool add_material(
mi::neuraylib::Target_function_description* function_descriptions,
bool class_compilation = false);
&get_material_defs()
{
return m_material_defs;
}
{
return m_compiled_materials;
}
private:
bool class_compilation);
private:
};
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives)
, m_be_cuda_ptx(mdl_compiler->get_backend(mi::neuraylib::IMdl_compiler::MB_CUDA_PTX))
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
if (enable_derivatives) {
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
}
{
size_t p = material_name.
rfind(
"::");
return material_name.
substr(0, p);
}
{
size_t p = material_name.
rfind(
"::");
if (p == std::string::npos)
return material_name;
return material_name.
substr(p + 2, material_name.
size() - p);
}
{
check_success(m_mdl_compiler->load_module(m_transaction.get(), module_name.
c_str()) >= 0);
const char *prefix = (module_name.
find(
"::") == 0) ?
"mdl" :
"mdl::";
mi::Size num_materials = module->get_material_count();
for (
mi::Size i = 0; i < num_materials; ++i) {
material_names[i] = module->get_material(i);
}
return material_names;
}
{
std::string module_name = get_module_name(material_name);
check_success(m_mdl_compiler->load_module(m_transaction.get(), module_name.
c_str(), m_context.get()) >= 0);
print_messages(m_context.get());
const char *prefix = (material_name.
find(
"::") == 0) ?
"mdl" :
"mdl::";
material_db_name.
c_str()));
check_success(material_definition);
m_material_defs.push_back(material_definition);
material_definition->create_material_instance(0, &result));
check_success(result == 0);
material_instance->retain();
return material_instance.get();
}
bool class_compilation)
{
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
compiled_material->retain();
return compiled_material.get();
}
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
#endif
return code_cuda_ptx;
}
bool Material_compiler::add_material_subexpr(
const char* path,
const char* fname,
bool class_compilation)
{
create_material_instance(material_name.
c_str()));
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material_expression(compiled_material.get(), path, fname, m_context.get());
return print_messages(m_context.get());
}
bool Material_compiler::add_material_df(
const char* path,
const char* base_fname,
bool class_compilation)
{
create_material_instance(material_name.
c_str()));
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material_df(compiled_material.get(), path, base_fname, m_context.get());
return print_messages(m_context.get());
}
bool Material_compiler::add_material(
mi::neuraylib::Target_function_description* function_descriptions,
bool class_compilation)
{
create_material_instance(material_name.
c_str()));
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
bool res = print_messages(m_context.get());
if (res)
for (size_t i = 0; i < description_count; ++i)
function_descriptions[i].argument_block_index++;
return res;
}
void print_array_u32(
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
str += "1] = { 0 };\n";
} else {
str +=
to_string(count) +
"] = { " + content +
" };\n";
}
}
void print_array_func(
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
str += "1] = { dummy_func };\n";
} else {
str +=
to_string(count) +
"] = { " + content +
" };\n";
}
}
{
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
src += ".func dummy_func() { ret; }\n";
unsigned f_count = 0;
for (
size_t tc_index = 0, num = target_codes.
size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
tc_offsets += ", ";
func_index < func_count; ++func_index)
{
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
f_count++;
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
src +=
std::string(
".visible .const .align 4 .u32 mdl_target_code_count = ")
print_array_u32(
src,
std::string(
"mdl_target_code_offsets"),
unsigned(target_codes.
size()), tc_offsets);
src +=
std::string(
".visible .const .align 4 .u32 mdl_functions_count = ")
print_array_func(src,
std::string(
"mdl_functions"), f_count, function_names);
print_array_u32(src,
std::string(
"mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src,
std::string(
"mdl_target_code_indices"), f_count, tc_indices);
return src;
}
CUmodule build_linked_kernel(
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
std::cout <<
"Dumping CUDA PTX code for the \"mdl_expr_functions\" array:\n\n"
#endif
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
for (
size_t i = 0, num_target_codes = target_codes.
size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.
c_str()),
ptx_func_array_src.
size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
check_cuda_success(link_result);
}
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(®s, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif // EXAMPLE_CUDA_SHARED_H
Source Code Location: examples/mdl_sdk/df_cuda/example_df_cuda.cu
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <curand_kernel.h>
#define _USE_MATH_DEFINES
#include "example_df_cuda.h"
#include "texture_support_cuda.h"
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
#ifdef ENABLE_DERIVATIVES
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
struct Target_code_data
{
size_t num_textures;
Texture *textures;
size_t num_mbsdfs;
Mbsdf *mbsdfs;
size_t num_lightprofiles;
Lightprofile *lightprofiles;
char const *ro_data_segment;
};
union Mdl_function_ptr
{
Mat_expr_func *expression;
Bsdf_init_func *bsdf_init;
Bsdf_sample_func *bsdf_sample;
Bsdf_evaluate_func *bsdf_evaluate;
Bsdf_pdf_func *bsdf_pdf;
Edf_init_func *edf_init;
Edf_sample_func *edf_sample;
Edf_evaluate_func *edf_evaluate;
Edf_pdf_func *edf_pdf;
};
extern __constant__ unsigned int mdl_target_code_offsets[];
extern __constant__ unsigned int mdl_functions_count;
extern __constant__ Mdl_function_ptr mdl_functions[];
extern __constant__ unsigned int mdl_arg_block_indices[];
__constant__ const float4 identity[3] = {
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
typedef uint3 Mdl_function_index;
__device__ inline Mdl_function_index get_mdl_function_index(const uint2& index_pair)
{
return make_uint3(
index_pair.x,
index_pair.y,
mdl_target_code_offsets[index_pair.x] + index_pair.y);
}
struct Mdl_resource_handler
{
__device__ Mdl_resource_handler()
{
m_tex_handler.vtable = &TEX_VTABLE;
data.shared_data = NULL;
data.texture_handler = reinterpret_cast<Texture_handler_base *>(&m_tex_handler);
}
__device__ inline void set_target_code_index(
const Kernel_params& params, const Mdl_function_index& index)
{
m_tex_handler.num_textures = params.tc_data[index.x].num_textures;
m_tex_handler.textures = params.tc_data[index.x].textures;
m_tex_handler.num_mbsdfs = params.tc_data[index.x].num_mbsdfs;
m_tex_handler.mbsdfs = params.tc_data[index.x].mbsdfs;
m_tex_handler.num_lightprofiles = params.tc_data[index.x].num_lightprofiles;
m_tex_handler.lightprofiles = params.tc_data[index.x].lightprofiles;
}
private:
Tex_handler m_tex_handler;
};
__device__ inline bool is_valid(const Mdl_function_index& index)
{
return index.y != 0xFFFFFFFFu;
}
__device__ inline const char* get_arg_block(
const Kernel_params& params,
const Mdl_function_index& index)
{
return params.arg_block_list[mdl_arg_block_indices[index.z]];
}
__device__ inline void prepare_state(
const Kernel_params& params,
const Mdl_function_index& index,
Mdl_state& state,
{
state.ro_data_segment = params.tc_data[index.x].ro_data_segment;
state.normal = normal;
}
__device__ inline Mat_expr_func* as_expression(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].expression;
}
__device__ inline Bsdf_init_func* as_bsdf_init(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].bsdf_init;
}
__device__ inline Bsdf_sample_func* as_bsdf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].bsdf_sample;
}
__device__ inline Bsdf_evaluate_func* as_bsdf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].bsdf_evaluate;
}
__device__ inline Bsdf_pdf_func* as_bsdf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].bsdf_pdf;
}
__device__ inline Edf_init_func* as_edf_init(const Mdl_function_index& index)
{
return mdl_functions[index.z + 0].edf_init;
}
__device__ inline Edf_sample_func* as_edf_sample(const Mdl_function_index& index)
{
return mdl_functions[index.z + 1].edf_sample;
}
__device__ inline Edf_evaluate_func* as_edf_evaluate(const Mdl_function_index& index)
{
return mdl_functions[index.z + 2].edf_evaluate;
}
__device__ inline Edf_pdf_func* as_edf_pdf(const Mdl_function_index& index)
{
return mdl_functions[index.z + 3].edf_pdf;
}
__device__
inline float3
operator+(
const float3& a,
const float3& b)
{
return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
__device__
inline float3
operator-(
const float3& a,
const float3& b)
{
return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
__device__
inline float3
operator*(
const float3& a,
const float3& b)
{
return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
__device__
inline float3
operator*(
const float3& a,
const float s)
{
return make_float3(a.x * s, a.y * s, a.z * s);
}
__device__
inline float3
operator/(
const float3& a,
const float s)
{
return make_float3(a.x / s, a.y / s, a.z / s);
}
__device__
inline void operator+=(float3& a,
const float3& b)
{
a.x += b.x; a.y += b.y; a.z += b.z;
}
__device__
inline void operator-=(float3& a,
const float3& b)
{
a.x -= b.x; a.y -= b.y; a.z -= b.z;
}
__device__
inline void operator*=(float3& a,
const float3& b)
{
a.x *= b.x; a.y *= b.y; a.z *= b.z;
}
__device__
inline void operator*=(float3& a,
const float& s)
{
a.x *= s; a.y *= s; a.z *= s;
}
__device__ inline float squared_length(const float3 &d)
{
return d.x * d.x + d.y * d.y + d.z * d.z;
}
__device__ inline float3 normalize(const float3 &d)
{
const float inv_len = 1.0f / sqrtf(d.x * d.x + d.y * d.y + d.z * d.z);
return make_float3(d.x * inv_len, d.y * inv_len, d.z * inv_len);
}
__device__
inline float dot(
const float3 &u,
const float3 &v)
{
return u.x * v.x + u.y * v.y + u.z * v.z;
}
__device__
inline float3
cross(
const float3 &u,
const float3 &v)
{
return make_float3(
u.y * v.z - u.z * v.y,
u.z * v.x - u.x * v.z,
u.x * v.y - u.y * v.x);
}
typedef curandStatePhilox4_32_10_t Rand_state;
__device__ inline float2 environment_coords(const float3 &dir)
{
const float u = atan2f(dir.z, dir.x) * (float)(0.5 / M_PI) + 0.5f;
const float v = acosf(
fmax(fminf(-dir.y, 1.0f), -1.0f)) * (float)(1.0 / M_PI);
return make_float2(u, v);
}
__device__ inline float3 environment_sample(
float3 &dir,
float &pdf,
const float3 &xi,
const Kernel_params ¶ms)
{
const unsigned int size = params.env_size.x * params.env_size.y;
const unsigned int idx =
min((
unsigned int)(xi.x * (
float)size), size - 1);
unsigned int env_idx;
float xi_y = xi.y;
if (xi_y < params.env_accel[idx].q) {
env_idx = idx ;
xi_y /= params.env_accel[idx].q;
} else {
env_idx = params.env_accel[idx].alias;
xi_y = (xi_y - params.env_accel[idx].q) / (1.0f - params.env_accel[idx].q);
}
const unsigned int py = env_idx / params.env_size.x;
const unsigned int px = env_idx % params.env_size.x;
pdf = params.env_accel[env_idx].pdf;
const float u = (float)(px + xi_y) / (float)params.env_size.x;
const float phi = u * (float)(2.0 * M_PI) - (float)M_PI;
float sin_phi, cos_phi;
sincosf(phi, &sin_phi, &cos_phi);
const float step_theta = (float)M_PI / (float)params.env_size.y;
const float theta0 = (float)(py) * step_theta;
const float cos_theta = cosf(theta0) * (1.0f - xi.z) + cosf(theta0 + step_theta) * xi.z;
const float theta = acosf(cos_theta);
const float sin_theta = sinf(theta);
dir = make_float3(cos_phi * sin_theta, -cos_theta, sin_phi * sin_theta);
const float v = theta * (float)(1.0 / M_PI);
const float4 t = tex2D<float4>(params.env_tex, u, v);
return make_float3(t.x, t.y, t.z) / pdf;
}
__device__ inline float3 environment_eval(
float &pdf,
const float3 &dir,
const Kernel_params ¶ms)
{
const float2 uv = environment_coords(dir);
const unsigned int x =
min((
unsigned int)(uv.x * (
float)params.env_size.x), params.env_size.x - 1);
const unsigned int y =
min((
unsigned int)(uv.y * (
float)params.env_size.y), params.env_size.y - 1);
pdf = params.env_accel[y * params.env_size.x + x].pdf;
const float4 t = tex2D<float4>(params.env_tex, uv.x, uv.y);
return make_float3(t.x, t.y, t.z);
}
__device__ inline float intersect_sphere(
const float3 &pos,
const float3 &dir,
const float radius)
{
const float b = 2.0f *
dot(dir, pos);
const float c =
dot(pos, pos) - radius * radius;
float tmp = b * b - 4.0f * c;
if (tmp < 0.0f)
return -1.0f;
tmp = sqrtf(tmp);
const float t0 = (((b < 0.0f) ? -tmp : tmp) - b) * 0.5f;
const float t1 = c / t0;
const float m = fminf(t0, t1);
return m > 0.0f ? m : fmaxf(t0, t1);
}
struct Ray_state {
float3 contribution;
float3 weight;
float3 pos, pos_rx, pos_ry;
float3 dir, dir_rx, dir_ry;
bool inside;
int intersection;
};
__device__ inline bool trace_sphere(
Rand_state &rand_state,
Ray_state &ray_state,
const Kernel_params ¶ms)
{
const float t = intersect_sphere(ray_state.pos, ray_state.dir, 1.0f);
if (t < 0.0f) {
if (ray_state.intersection == 0 && params.mdl_test_type != MDL_TEST_NO_ENV) {
const float2 uv = environment_coords(ray_state.dir);
const float4 texval = tex2D<float4>(params.env_tex, uv.x, uv.y);
ray_state.contribution += make_float3(texval.x, texval.y, texval.z);
}
return false;
}
ray_state.pos += ray_state.dir * t;
const float3 normal = normalize(ray_state.pos);
const float phi = atan2f(normal.x, normal.z);
const float theta = acosf(normal.y);
const float3 uvw = make_float3(
(phi * (float)(0.5 / M_PI) + 0.5f) * 2.0f,
1.0f - theta * (float)(1.0 / M_PI),
0.0f);
float sp, cp;
sincosf(phi, &sp, &cp);
const float st = sinf(theta);
float3 tangent_u = make_float3(cp * st, 0.0f, -sp * st) * (float)M_PI;
float3 tangent_v = make_float3(sp * normal.y, -st, cp * normal.y) * (float)(-M_PI);
#ifdef ENABLE_DERIVATIVES
{ uvw, { 0.0f, 0.0f, 0.0f }, { 0.0f, 0.0f, 0.0f } } };
if (params.use_derivatives && ray_state.intersection == 0)
{
const float d =
dot(normal, ray_state.pos);
const float tx = (d -
dot(normal, ray_state.pos_rx)) /
dot(normal, ray_state.dir_rx);
const float ty = (d -
dot(normal, ray_state.pos_ry)) /
dot(normal, ray_state.dir_ry);
ray_state.pos_rx += ray_state.dir_rx * tx;
ray_state.pos_ry += ray_state.dir_ry * ty;
float4 A;
float2 B_x, B_y;
if (fabsf(normal.x) > fabsf(normal.y) && fabsf(normal.x) > fabsf(normal.z)) {
B_x = make_float2(
ray_state.pos_rx.y - ray_state.pos.y,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.y - ray_state.pos.y,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
tangent_u.y, tangent_u.z, tangent_v.y, tangent_v.z);
} else if (fabsf(normal.y) > fabsf(normal.z)) {
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.z - ray_state.pos.z);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.z - ray_state.pos.z);
A = make_float4(
tangent_u.x, tangent_u.z, tangent_v.x, tangent_v.z);
} else {
B_x = make_float2(
ray_state.pos_rx.x - ray_state.pos.x,
ray_state.pos_rx.y - ray_state.pos.y);
B_y = make_float2(
ray_state.pos_ry.x - ray_state.pos.x,
ray_state.pos_ry.y - ray_state.pos.y);
A = make_float4(
tangent_u.x, tangent_u.y, tangent_v.x, tangent_v.y);
}
const float det = A.x * A.w - A.y * A.z;
if (fabsf(det) > 1e-10f) {
const float inv_det = 1.0f / det;
texture_coords[0].dx.x = inv_det * (A.w * B_x.x - A.z * B_x.y);
texture_coords[0].dx.y = inv_det * (A.x * B_x.y - A.y * B_x.x);
texture_coords[0].dy.x = inv_det * (A.w * B_y.x - A.z * B_y.y);
texture_coords[0].dy.y = inv_det * (A.x * B_y.y - A.y * B_y.x);
}
}
#else
#endif
tangent_u = normalize(tangent_u);
tangent_v = normalize(tangent_v);
float4 texture_results[16];
Df_cuda_material material = params.material_buffer[params.current_material];
Mdl_resource_handler mdl_resources;
Mdl_state state = {
normal,
normal,
ray_state.pos,
0.0f,
texture_coords,
&tangent_u,
&tangent_v,
texture_results,
NULL,
identity,
identity,
0
};
Mdl_function_index func_idx;
if (ray_state.intersection > 0)
{
func_idx = get_mdl_function_index(material.volume_absorption);
if (is_valid(func_idx)) {
mdl_resources.set_target_code_index(params, func_idx);
const char* arg_block = get_arg_block(params, func_idx);
prepare_state(params, func_idx, state, normal);
float3 abs_coeff;
as_expression(func_idx)(
&abs_coeff, &state, &mdl_resources.data, NULL, arg_block);
ray_state.weight.x *= abs_coeff.x > 0.0f ? expf(-abs_coeff.x * t) : 1.0f;
ray_state.weight.y *= abs_coeff.y > 0.0f ? expf(-abs_coeff.y * t) : 1.0f;
ray_state.weight.z *= abs_coeff.z > 0.0f ? expf(-abs_coeff.z * t) : 1.0f;
}
}
func_idx = get_mdl_function_index(material.edf);
if (is_valid(func_idx))
{
mdl_resources.set_target_code_index(params, func_idx);
const char* arg_block = get_arg_block(params, func_idx);
prepare_state(params, func_idx, state, normal);
as_edf_init(func_idx)(&state, &mdl_resources.data, NULL, arg_block);
eval_data.
k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
as_edf_evaluate(func_idx)(&eval_data, &state, &mdl_resources.data, NULL, arg_block);
float3 emission_intensity = make_float3(0.0, 0.0, 0.0);
func_idx = get_mdl_function_index(material.emission_intensity);
if (is_valid(func_idx))
{
mdl_resources.set_target_code_index(params, func_idx);
arg_block = get_arg_block(params, func_idx);
prepare_state(params, func_idx, state, normal);
as_expression(func_idx)(
&emission_intensity, &state, &mdl_resources.data, NULL, arg_block);
}
ray_state.contribution += emission_intensity * eval_data.
edf;
}
func_idx = get_mdl_function_index(material.bsdf);
if (is_valid(func_idx))
{
mdl_resources.set_target_code_index(params, func_idx);
const char* arg_block = get_arg_block(params, func_idx);
prepare_state(params, func_idx, state, normal);
as_bsdf_init(func_idx)(&state, &mdl_resources.data, NULL, arg_block);
union
{
};
if (ray_state.inside)
{
sample_data.
ior1.x = BSDF_USE_MATERIAL_IOR;
sample_data.ior2 = make_float3(1.0f, 1.0f, 1.0f);
}
else
{
sample_data.ior1 = make_float3(1.0f, 1.0f, 1.0f);
sample_data.ior2.x = BSDF_USE_MATERIAL_IOR;
}
sample_data.k1 = make_float3(-ray_state.dir.x, -ray_state.dir.y, -ray_state.dir.z);
if (params.light_intensity.x > 0.0f ||
params.light_intensity.y > 0.0f ||
params.light_intensity.z > 0.0f)
{
float3 to_light = params.light_pos - ray_state.pos;
const float check_sign = squared_length(params.light_pos) < 1.0f ? -1.0f : 1.0f;
if (
dot(to_light, normal) * check_sign > 0.0f)
{
const float inv_squared_dist = 1.0f / squared_length(to_light);
eval_data.k2 = to_light * sqrtf(inv_squared_dist);
const float3 f = params.light_intensity * inv_squared_dist * (float) (0.25 / M_PI);
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, NULL, arg_block);
ray_state.contribution += ray_state.weight * f * eval_data.bsdf;
}
}
if (params.mdl_test_type != MDL_TEST_SAMPLE && params.mdl_test_type != MDL_TEST_NO_ENV)
{
const float xi0 = curand_uniform(&rand_state);
const float xi1 = curand_uniform(&rand_state);
const float xi2 = curand_uniform(&rand_state);
float3 light_dir;
float pdf;
const float3 f = environment_sample(light_dir, pdf, make_float3(xi0, xi1, xi2), params);
const float cos_theta =
dot(light_dir, normal);
if (cos_theta > 0.0f && pdf > 0.0f)
{
eval_data.k2 = light_dir;
as_bsdf_evaluate(func_idx)(
&eval_data, &state, &mdl_resources.data, NULL, arg_block);
const float mis_weight =
(params.mdl_test_type == MDL_TEST_EVAL) ? 1.0f : pdf / (pdf + eval_data.
pdf);
ray_state.contribution += ray_state.weight * f * eval_data.bsdf * mis_weight;
}
}
{
sample_data.xi.x = curand_uniform(&rand_state);
sample_data.xi.y = curand_uniform(&rand_state);
sample_data.xi.z = curand_uniform(&rand_state);
as_bsdf_sample(func_idx)(&sample_data, &state, &mdl_resources.data, NULL, arg_block);
if (sample_data.event_type == BSDF_EVENT_ABSORB)
return false;
ray_state.dir = sample_data.k2;
ray_state.weight *= sample_data.bsdf_over_pdf;
const bool transmission = (sample_data.event_type & BSDF_EVENT_TRANSMISSION) != 0;
if (transmission)
ray_state.inside = !ray_state.inside;
if (ray_state.inside)
{
ray_state.pos -= normal * 0.001f;
return true;
}
else if (params.mdl_test_type != MDL_TEST_NO_ENV &&
params.mdl_test_type != MDL_TEST_EVAL)
{
float pdf;
const float3 f = environment_eval(pdf, sample_data.k2, params);
float bsdf_pdf;
if (params.mdl_test_type == MDL_TEST_MIS_PDF)
{
const float3 k2 = sample_data.k2;
pdf_data.k2 = k2;
as_bsdf_pdf(func_idx)(&pdf_data, &state, &mdl_resources.data, NULL, arg_block);
bsdf_pdf = pdf_data.pdf;
}
else
bsdf_pdf = sample_data.pdf;
const bool is_specular =
(sample_data.event_type & BSDF_EVENT_SPECULAR) != 0;
if (is_specular || bsdf_pdf > 0.0f)
{
const float mis_weight = is_specular ||
(params.mdl_test_type == MDL_TEST_SAMPLE) ? 1.0f :
bsdf_pdf / (pdf + bsdf_pdf);
ray_state.contribution += ray_state.weight * f * mis_weight;
}
}
}
}
return false;
}
__device__ inline float3 render_sphere(
Rand_state &rand_state,
const Kernel_params ¶ms,
const unsigned x,
const unsigned y)
{
const float inv_res_x = 1.0f / (float)params.resolution.x;
const float inv_res_y = 1.0f / (float)params.resolution.y;
const float dx = params.disable_aa ? 0.5f : curand_uniform(&rand_state);
const float dy = params.disable_aa ? 0.5f : curand_uniform(&rand_state);
const float2 screen_pos = make_float2(
((float)x + dx) * inv_res_x,
((float)y + dy) * inv_res_y);
const float r = (2.0f * screen_pos.x - 1.0f);
const float r_rx = (2.0f * (screen_pos.x + inv_res_x) - 1.0f);
const float u = (2.0f * screen_pos.y - 1.0f);
const float u_ry = (2.0f * (screen_pos.y + inv_res_y) - 1.0f);
const float aspect = (float)params.resolution.y / (float)params.resolution.x;
Ray_state ray_state;
ray_state.contribution = make_float3(0.0f, 0.0f, 0.0f);
ray_state.weight = make_float3(1.0f, 1.0f, 1.0f);
ray_state.pos = ray_state.pos_rx = ray_state.pos_ry = params.cam_pos;
ray_state.dir = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u);
ray_state.dir_rx = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r_rx + params.cam_up * aspect * u);
ray_state.dir_ry = normalize(
params.cam_dir * params.cam_focal + params.cam_right * r + params.cam_up * aspect * u_ry);
ray_state.inside = false;
const unsigned int max_num_intersections = params.max_path_length - 1;
for (ray_state.intersection = 0; ray_state.intersection < max_num_intersections;
++ray_state.intersection)
if (!trace_sphere(rand_state, ray_state, params))
break;
return
isfinite(ray_state.contribution.z) ? ray_state.contribution : make_float3(0.0f, 0.0f, 0.0f);
}
__device__ inline unsigned int display(float3 val, const float tonemap_scale)
{
val *= tonemap_scale;
const float burn_out = 0.1f;
val.x *= (1.0f + val.x * burn_out) / (1.0f + val.x);
val.y *= (1.0f + val.y * burn_out) / (1.0f + val.y);
val.z *= (1.0f + val.z * burn_out) / (1.0f + val.z);
const unsigned int r =
(unsigned int)(255.0 * fminf(powf(fmaxf(val.x, 0.0f), (float)(1.0 / 2.2)), 1.0f));
const unsigned int g =
(unsigned int)(255.0 * fminf(powf(fmaxf(val.y, 0.0f), (float)(1.0 / 2.2)), 1.0f));
const unsigned int b =
(unsigned int)(255.0 * fminf(powf(fmaxf(val.z, 0.0f), (float)(1.0 / 2.2)), 1.0f));
return 0xff000000 | (r << 16) | (g << 8) | b;
}
extern "C" __global__ void render_sphere_kernel(
const Kernel_params kernel_params)
{
const unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= kernel_params.resolution.x || y >= kernel_params.resolution.y)
return;
const unsigned int idx = y * kernel_params.resolution.x + x;
Rand_state rand_state;
const unsigned int num_dim = kernel_params.disable_aa ? 6 : 8;
curand_init(idx, 0, kernel_params.iteration_start * num_dim, &rand_state);
float3 value = make_float3(0.0f, 0.0f, 0.0f);
for (unsigned int s = 0; s < kernel_params.iteration_num; ++s)
{
value += render_sphere(
rand_state,
kernel_params,
x, y);
}
value *= 1.0f / (float)kernel_params.iteration_num;
if (kernel_params.iteration_start == 0)
kernel_params.accum_buffer[idx] = value;
else {
kernel_params.accum_buffer[idx] = kernel_params.accum_buffer[idx] +
(value - kernel_params.accum_buffer[idx]) *
((float)kernel_params.iteration_num /
(float)(kernel_params.iteration_start + kernel_params.iteration_num));
}
if (kernel_params.display_buffer)
kernel_params.display_buffer[idx] =
display(kernel_params.accum_buffer[idx], kernel_params.exposure_scale);
}