- Hidden state now tracks the Instance object to keep the handle small
- Make the recreate supplier an explicit record to allow comparisons
- Add setVisible method to Instance
- Use state machine interface in InstanceHandleImpl
- 3 states: deleted, visible, hidden
- Visible is directly implemented by AbstractInstancer
- Hidden stores the instancer supplier to recreate an instancer
- Eagerly load ALL shaders in ShaderSources, resolving imports there
- Compile and cache programs on-demand
- Move gl state try blocks to EngineImpl
- EngineImpl catches shader exceptions and triggers a fallback
- Fix crumbling on indirect
- Directly use the baseInstance as instance index without indirection
- #define base instance and draw id variables to simplify usage
- Fix null pointer looking up culling group
- Add method to map an instancer's local instance index to a global
index in the page file
- Remove ModelHolder and ModelCache
- Remove lib/util.FlwUtil
- Remove lib/util.Pair and replace usages with com.mojang.datafixers.util.Pair
- Remove lib/util.Unit and replace usages with net.minecraft.util.Unit
- Make ResourceReloadHolder and ResourceReloadCache final and move to util
- Clean up code in backend/glsl
- Move LightSmoothnessArgument to impl
- Remove LoweringVisitor
- Move functionality of four main static methods in LoweringVisitor to new ModelTrees class
- Return ModelTree directly
- Accept Material instead of TextureAtlasSprite for efficiency, so visuals don't need to look up the sprite to get the ModelTree
- Use ResourceReloadCache for MeshTree.CACHE
- Implement instance hiding by deleting/stealing
- Work around instancer persistence by storing a recreation supplier in
the instance handle
- Rework instancer ctors to just take an InstancerKey
- Parameterize InstanceHandle by I extends Instance so the steal method
and the supplier can be safely assigned
- IndirectInstancer#uploadInstances: 46% of render thread to 26%
- Inline #enqueueCopy to avoid allocating LongConsumers
- Do not even bother to track individual changed indices, instead rely
on just the changedPage set
- Convert ShulkerBoxVisual to use InstanceTree
- Add "pruning" helper visitors
- Remove ModelPartConverter
- Remove TextureMapper and related code from VertexWriter
- Add ModelTree
- Add LoweringVisitor to traverse a MeshTree and emit ModelTree nodes
and Models
- Provide some default visitor creation methods
- Abstract ModelCache -> ResourceReloadCache
- Abstract ModelHolder -> ResourceReloadHolder
- Add ModelTreeCache to hide lookup cost if it gets extreme
- Use InstanceTrees in BellVisual and MinecartVisual
- Use JOML Matrix4fStack instead of PoseStack
- Directly transform Matrix4f instead of using PoseStack to compute initial pose
- Track if the transforms for an InstanceTree have changed
- Pass a boolean down the tree and || it with our changed flag to force
updates
- Expose the force flag to visuals so they can hint to us if their root
transforms never change
- Add 2 wrapper methods to make the distinction more clear
- Store a pose matrix in each InstanceTree, equivalent to its instance's pose matrix if the instance exists
- Directly transform the current InstanceTree's pose matrix instead of transforming a PoseStack and copying its matrix to the instance, eliminating the need to push and pop stack entries
- Remove InstanceTree.rotation
- Add more InstanceTree methods to allow full inspection of children
- Rename TransformedInstance -> PosedInstance
- Add TransformedInstance, which does not have a normal matrix
- Use mat3(transpose(inverse(i.pose))) in the vertex shader, but for
most cases that will be overkill
- Use arrrays in the trees
- MeshTree tracks parallel arrays of children and keys
- InstanceTree tracks which mesh tree it came from for lookup purposes
- Remove walker object
- Make RecyclingPoseStack use add/removeLast instead of push/pop
- Add MeshTree and InstanceTree
- Deprecate ModelPartConverter for removal
- Refactor ChestVisual to use InstanceTree
- Combine double chest light in ChestVisual
- Cherry-pick misc cleanups from last-frame-visibility
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
pass to bind exactly which buffers it needs
- Use DSA for the depth pyramid
- Pass the map of util programs to IndirectPrograms rather than
unpacking them individually
- Actually delete all the indirect utils
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
IndirectInstancer
- Free pages when an allocation is deleted
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Bump arch loom and gradle versions
- Do not set default refmap name
- Enable legacy mixin ap in loom
- Individually add sourcesets to looms refmap stuffs
* fix loom version getting
very cursed but it works :)
* remove commented out stuff in settings.gradle.kts, unnecessary plugins in buildSrc, and configure buildSrc to download sources/javadoc
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
priority at runtime
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
actually bind the matrix buffer for the cull shader. Not sure how it
worked at all before
- Minor styling/cleanup
- Optimize embeddings on indirect backend by uploading all matrices in
an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
InstanceType, so now all instances from all embeddings get culled in
the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
less GPU time, and we may want to do this for actual user-authored
material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
- Manually unroll all loops in light_lut with the help of macros
- Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
bandwidth required but that turned out much slower. Perhaps there's
still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
consistent with everything else and improve the memory access pattern
- Merge flywheel-backend config into an object within the base flywheel
config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
4x as many shaders
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
when reading the 3x3x3 light volume for a fragment. This saves a
significant amount of memory and integer additions in the shader, but
does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
to manually unroll to be cross-platform
- Remove redundant flw_light
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
divide to avoid the incorrect "AO" effect near the ground