WTF is a Scene Graph?

September 17, 2017

Not too long ago I started to dabble in low-level graphics programming again. I set a goal of creating a basic scene that can be defined in a similar way to production-grade game engines like Unity and friends. I felt comfortable with basic OpenGL at the time, however the guides that I followed never told me how to create a hierarchical scene.

A screenshot of the Unity game engine editor.

In addition to this many examples made assumptions about the 3D environment that cannot be made when using a scene graph. I will be glossing over implementation specific details related to the graphics pipeline in this article and focus on scene graph itself, you can view the full source on GitHub if you want to see how things fit together.

What is a Scene Graph?

A scene graph is a tree data structure with nodes. Each node contains transformation matrices that define their position in 3D space; these transformation matrices are generated using local position, rotation, and scale values and incorporating parent transformations to create a world transformation. To keep things brief, this article does not go over OpenGL or theory of how coordinate systems work; you can get a more detailed explanation of these things at https://learnopengl.com/ if you’re not already familiar.

A tree with 3 layers. The first layer with the root node is labeled "Scene Root", with the second layer being connected to it and labeled "Car", and the final layer having 5 connected nodes labeled "Driver", "Wheel 1 (Rotating)", "Wheel 2 (Rotating)", "Wheel 3 (Rotating)", and "Wheel 4 (Rotating)".

This visualization shows a very basic scenegraph where there is a single car with spinning wheels. Each node in the tree has a local position, rotation, and scale. Each node’s position is relative to its parent so if the car moves (via any transform), the wheels and driver stay in the same position relative to the car. We’re going to go over a basic scene graph implementation that allows us to represent this kind of hierarchy.

Traversing the Tree

Each time a frame is being rendered, the updateWorldTransform method is called on the root node. This method will calculate the local and world transformation matrices if needed. First and foremost the local transform needs to be created using matrix translate, rotate, and scale functions from GLM. Once the local transform is acquired it can be multiplied against its parent’s world transform using the cross product, this being the actual location in world space where the node is located.

void Node::updateWorldTransform() {
    if (dirty) {
        auto parent = this->parent.lock();
        this->updateLocalTransform();
        if (parent.get() != nullptr) {
            worldTransform = parent.get()->worldTransform * localTransform;
        } else {
            worldTransform = glm::mat4() * localTransform;
        }

        dirty = false;
        for (auto node : children) {
            node.get()->markDirty();
            node.get()->updateWorldTransform();
        }
    } else {
        for (auto node : children) {
            node.get()->updateWorldTransform();
        }
    }
}

// TODO: Use quaternions to save on matrix multiplications.
void Node::updateLocalTransform() {
    if (dirty) {
        auto transform = glm::mat4();
        transform = glm::translate(transform, position);
        transform = glm::rotate(transform, rotation.y, glm::vec3(0.0, 1.0, 0.0));
        transform = glm::rotate(transform, rotation.x, glm::vec3(1.0, 0.0, 0.0));
        transform = glm::rotate(transform, rotation.z, glm::vec3(0.0, 0.0, 1.0));
        transform = glm::scale(transform, scale);
        localTransform = transform;
    }
}

You’ll see that children nodes are referenced and the updateWorldTransform method is called recursively on them. This is done because any changes to the current node’s world transform needs to change the child’s transform as children are positioned relative to their parents. If this code was omitted and we used the car example, the wheels would stay put if the car chassis moved forward.

You may also note that there is a check against dirty. This is an optional optimization that the walker uses so it knows whether a transformation needs to be computed or not. The class uses setters to automatically set the dirty flag when certain values are changed so this happens transparently. The tree is still traversed in-case any children are marked dirty as the traversal starts at the root. There are probably faster ways to do this when working with larger scene graphs.

Adding and Removing Nodes

Adding nodes is as simple as passing a shared_ptr into a local children vector. Once added the dirty flag is set so the walker creates world transforms for the children as soon as possible.

Removing nodes is as simple as removing the parent’s shared pointer to the node and letting RAII take care of the rest. The children have a weak pointer to the parent to prevent cyclic references (and cross references between nodes should use weak_ptr).

Making the Camera Work

Generating the transforms for the camera mostly the same sans a few additions. A view and projection matrix are created alongside the local and world transforms. Generating the view matrix from the world transform is tricky when using lookAt to generate the matrix. The lookAt function requires an up vector and one needs to be calculated as the camera can have any arbitrary rotation and we can’t just make it point up (FPS style) like a lot of other guides do. Lots of this code is based off of work from A Camera Implementation in C.

PerspectiveCamera::PerspectiveCamera(std::string name, glm::vec3 position,
        glm::vec3 rotation, glm::vec3 scale, float fov, float aspect,
        float near, float far) {
    this->name = name;
    this->position = position;
    this->rotation = rotation;
    this->scale = scale;
    this->projectionMatrix = glm::perspective(fov, aspect, near, far);
}

void PerspectiveCamera::updateWorldTransform() {
    if (dirty) {
        auto parent = this->parent.lock();
        this->updateLocalTransform();
        glm::mat4 parentWorldTransform;
        if (parent.get() != nullptr) {
            parentWorldTransform = parent.get()->worldTransform;
            worldTransform = parent.get()->worldTransform * localTransform;
        } else {
            worldTransform = glm::mat4() * localTransform;
        }

        // Get a normalized rotation vector (the forward vector) that points
        // in the direction the camera is facing that's 1 unit in length.
        // Also generate the up and right vectors from the forward vector.
        const auto decomposed = this->getDecomposedTransform();
        const auto eulerRotation = glm::eulerAngles(decomposed.rotation);
        forward = -glm::normalize(glm::vec3(
            -sin(eulerRotation.y),
            sin(eulerRotation.x) * cos(eulerRotation.y),
            cos(eulerRotation.x) * cos(eulerRotation.y)
        ));
        up = glm::cross(
            glm::cross(forward, glm::vec3(0.0f, 1.0f, 0.0f)),
            forward
        );
        right = glm::cross(forward, up);

        // Create new right and up vectors based on the roll.
        right = glm::normalize(
            right * cosf(rotation.z * M_PI) +
            up * sinf(rotation.z * M_PI)
        );
        up = glm::cross(forward, right);
        up.y = -up.y;

        // Adding the rotation results in a point 1 unit in front of the
        // camera which is then passed to the lookAt function to create a
        // view matrix. The projection matrix is pre-baked for performance.
        target = decomposed.translation + forward;
        viewMatrix = glm::lookAt(decomposed.translation, target, up);
        viewProjectionMatrix = projectionMatrix * viewMatrix;

        dirty = false;
        for (auto node : children) {
            node.get()->markDirty();
            node.get()->updateWorldTransform();
        }
    } else {
        for (auto node : children) {
            node.get()->updateWorldTransform();
        }
    }
}

Forward, up, and right vectors are initially created using world transform data; however since the up and right vectors are created using the forward vector, the roll is not accounted for as a 3 dimensional vector has no concept of roll. To get around this a new right vector is created by rotating it using sin and cos, then the up vector can be recalculated by crossing the right and forward vectors since we know all the vectors should be perpendicular.

A 3D graphic showing that +Y is up, +Z is forward, and +X is right.

A target vector is also used to generate the view matrix which is a point 1 unit in front of the camera. This position can be calculated by adding the forward vector to the world position (all values in forward sum up to 1). With that we have the information required to generate the view matrix with lookAt. The projection matrix is created in the constructor and the view-projection matrix is generated by multiplying the view and projection matrices. The view-projection matrix is cached so it doesn’t need to be recalculated every frame.

In order to make objects appear from the camera’s point of view, their world transform needs to be multiplied with the camera’s view-projection matrix before drawing. This can be cached as well but is not done in this example to keep things simple.

scenegraph.get()->updateWorldTransform();

glEnable(GL_DEPTH_TEST);
glClearColor(0.3, 0.6, 0.8, 1.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

// Draw the parent cube.
auto modelViewProjectionMatrix = camera.get()->viewProjectionMatrix * parentThing.get()->worldTransform;
basicShader.setUniformMat4("transform", glm::value_ptr(modelViewProjectionMatrix));
basicShader.use();
textureTest.bind();
glBindVertexArray(vao);
glDrawArrays(GL_TRIANGLES, 0, 36);
// Draw the child cube.
modelViewProjectionMatrix = camera.get()->viewProjectionMatrix * childThing.get()->worldTransform;
basicShader.setUniformMat4("transform", glm::value_ptr(modelViewProjectionMatrix));
basicShader.use();
textureTest.bind();
glBindVertexArray(vao);
glDrawArrays(GL_TRIANGLES, 0, 36);

Now our final matrices are pushed to the vertex shader and some spinning cubes appear on-screen from the camera’s perspective.

A screenshot of a program rendering two dirt cubes on a blue background.

What’s Left?

This implementation does not handle reparenting gracefully like most scene graphs do as local coordinates are not changed when the parent transform changes. Rendering is also managed outside the walker and is in the main loop to keep the code simple; in a production project this would be managed elsewhere. There is also much potential for optimization when it comes to caching values and generating the view matrix.

Although very bare-bones, that’s a working scenegraph in action! Feel free to play around with the code or repurpose it as it’s licensed under the liberal CC0 license. You can get it on GitHub.

[1] https://github.com/amdrel/scenegraph-demo
[2] https://learnopengl.com/
[3] https://tuttlem.github.io/2013/12/30/a-camera-implementation-in-c.html