- IndirectInstancer#uploadInstances: 46% of render thread to 26%
- Inline #enqueueCopy to avoid allocating LongConsumers
- Do not even bother to track individual changed indices, instead rely
on just the changedPage set
- Convert ShulkerBoxVisual to use InstanceTree
- Add "pruning" helper visitors
- Remove ModelPartConverter
- Remove TextureMapper and related code from VertexWriter
- Add ModelTree
- Add LoweringVisitor to traverse a MeshTree and emit ModelTree nodes
and Models
- Provide some default visitor creation methods
- Abstract ModelCache -> ResourceReloadCache
- Abstract ModelHolder -> ResourceReloadHolder
- Add ModelTreeCache to hide lookup cost if it gets extreme
- Use InstanceTrees in BellVisual and MinecartVisual
- Use JOML Matrix4fStack instead of PoseStack
- Directly transform Matrix4f instead of using PoseStack to compute initial pose
- Track if the transforms for an InstanceTree have changed
- Pass a boolean down the tree and || it with our changed flag to force
updates
- Expose the force flag to visuals so they can hint to us if their root
transforms never change
- Add 2 wrapper methods to make the distinction more clear
- Store a pose matrix in each InstanceTree, equivalent to its instance's pose matrix if the instance exists
- Directly transform the current InstanceTree's pose matrix instead of transforming a PoseStack and copying its matrix to the instance, eliminating the need to push and pop stack entries
- Remove InstanceTree.rotation
- Add more InstanceTree methods to allow full inspection of children
- Rename TransformedInstance -> PosedInstance
- Add TransformedInstance, which does not have a normal matrix
- Use mat3(transpose(inverse(i.pose))) in the vertex shader, but for
most cases that will be overkill
- Use arrrays in the trees
- MeshTree tracks parallel arrays of children and keys
- InstanceTree tracks which mesh tree it came from for lookup purposes
- Remove walker object
- Make RecyclingPoseStack use add/removeLast instead of push/pop
- Add MeshTree and InstanceTree
- Deprecate ModelPartConverter for removal
- Refactor ChestVisual to use InstanceTree
- Combine double chest light in ChestVisual
- Cherry-pick misc cleanups from last-frame-visibility
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
pass to bind exactly which buffers it needs
- Use DSA for the depth pyramid
- Pass the map of util programs to IndirectPrograms rather than
unpacking them individually
- Actually delete all the indirect utils
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
IndirectInstancer
- Free pages when an allocation is deleted
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
priority at runtime
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
actually bind the matrix buffer for the cull shader. Not sure how it
worked at all before
- Minor styling/cleanup
- Optimize embeddings on indirect backend by uploading all matrices in
an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
InstanceType, so now all instances from all embeddings get culled in
the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
less GPU time, and we may want to do this for actual user-authored
material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
- Manually unroll all loops in light_lut with the help of macros
- Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
bandwidth required but that turned out much slower. Perhaps there's
still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
consistent with everything else and improve the memory access pattern
- Merge flywheel-backend config into an object within the base flywheel
config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
4x as many shaders
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
when reading the 3x3x3 light volume for a fragment. This saves a
significant amount of memory and integer additions in the shader, but
does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
to manually unroll to be cross-platform
- Remove redundant flw_light
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
divide to avoid the incorrect "AO" effect near the ground
- Expose an ivec3 flw_renderOrigin in the shader api
- Internally add flw_renderOrigin in flw_light(*)
- flw_lightFetch expects an actual world position still
- SimpleQuadMesh holds a reference to its backing MemoryBlock so the
cleaner doesn't drop it
- Fixes issue where meshes suddenly start rendering garbage
- Add material shader component to specify smooth lighting behavior
- Allows much easier composition of smooth lighting/material shader
effects, and potentially gives backends the option to specialize
shaders on the complexity of shader lighting
- Pack fog, cutout, and light into a single uint
- Extend InstancerProvider to allow visuals to bias the render order of
their instancers
- Keep the old InstancerProvider#instancer method with a bias of 0
- Add an explanation of render order in InstancerProvider
* Start on general API formalization
* More API improvements
- Add Engine#onLightUpdate; remove LightUpdateHolder and backend/ClientChunkCacheMixin
- Add Effect#level
- Add VisualizationHelper#queueAdd and #queueRemove for Effects
- Fix PartialModel not assigning bakedModel field when populating on init
- Fix PartialModel.ALL using weak keys instead of weak values
- Make Simple*Visualizer and corresponding inner Builder classes final
- Restore FlatLit#light overload that accepts block and sky light values separately
- Add AbstractBlockEntityVisual#relight overloads that accept Iterator and Iterable
- Reorganize classes in impl.vizualization
* TaskExecutor simplification
- Move TaskExecutor#sync* methods to TaskExecutorImpl
- Move Flag and RaisePlan to impl
- Remove TaskExecutor#scheduleForMainThread and #isMainThread methods
- Remove SyncedPlan
- Add Engine#setupRender
- Remove TaskExecutor parameters from Engine#render* methods
- Convert Engine$CrumblingBlock into an interface
- Unmark RenderContext as NonExtendable to allow fulfilling the purpose described in the doc of VisualizationManager#renderDispatcher
* Remove registry freeze callbacks
- Lazily initialize MaterialShaderIndices
- Rename MaterialShaders#*Shader to #*Source
- Move BackendImplemented to api.backend package
- Ensure section set returned by SectionTracker is Unmodifiable to avoid copy in LightUpdatedVisualStorage
- Do not recompute section set in ShaderLightVisualStorage if not dirty
- Fix BlockEntityStorage not clearing posLookup on recreation or invalidation
- Fix Storage.invalidate not clearing everything
- Inline TopLevelEmbeddedEnvironment and NestedEmbeddedEnvironment into AbstractEmbeddedEnvironment and rename to EmbeddedEnvironment
- Move some classes between packages
- Remove unused fields in EmbeddingUniforms
- Remove suffix on field names in BufferBindings
- Rename enqueueLightUpdateSection methods to onLightUpdate
- Rename SectionCollectorImpl to SectionTracker
- Rename classes, methods, fields, and parameters and edit javadoc and comments to match previously done renames, new renames, and other existing classes
- Optimize collecting light section edges
- Kinda an absurd amount of code, but I'm not sure how to parameterize
by an axis without having capturing lambdas
- Around 3-4x faster
- Only push light sections to the engine when the set of sections
requested by visuals changes
- Clean up light storage plan and comment code
- Remove LIGHT_VOLUME debug mode as it's no longer used
- Not attached to the name
- Add SmoothLitVisual opt in interface, allowing any visuals to
contribute light sections to the arena
- Remove lightChunks from VisualEmbedding, it has been usurped
- Pass total collected light sections from BEs, Es, and effects to the
engine interface. It seemed the most proper way to hand off
information from the impl to the backend
- Add SmoothLitVisualStorage to maintain the set of collected sections,
though at the moment it is very naive and simply unions everything
upon request, which is also naively done every frame