- Combine pages only when they're at most half full, and not empty
- This guarantees that we'll fully empty a page, allowing us to free the memory for use by other instancers
- Track mergeable pages via a separate bitset
- Try to shuffle over instances into pages with space
- Clear out now-unused logic from ObjectStorage
- Some cleanup and more comments in IndirectInstancer
- Make AbstractInstancer much more slim and move logic to BaseInstancer
- Extend paging concept to the indirect instancer
- Extend ObjectStorage to support more interesting layouts
- Instance creation on indirect is now entirely lock free and deletions
no longer require re-uploading the entire instancer
- MaterialEncoder would trigger an indexing of CutoutShaders.OFF, though
PipelineCompiler would explicitly not index OFF
- This caused a crash on instancing when MaterialEncoder would delete
all pipeline shaders while instancing was trying to upload the packed
ubershader uniform
* Backport changes from 1.21.1
* fix
* Fix building
* fix compile error
* fix
* fix build for real
* address reviews
* Fix sodium compat
* address requested changes
* mark rubidium as incompatible
* add missed call
* Should have worn steel toe boots
- Add "stub" sourceset to each subproject
- Directly pass vararg sourcesets to methods in PlatformExtension to
avoid automatically shipping jars with the api stubs
- We may have to include stubs in setupLoomMod, but I don't think so
- A lot of this can be stripped back out if we don't need stub sources
for the forge/fabric subprojects
* Guarded stubs
- Add Sodium 0.6 and Iris API stubs to stubs source set and remove Gradle dependencies on local Sodium jar, Iris, and Oculus
- Ensure usage of APIs that may not exist at runtime is in private classes and access is always guarded
- Change ShadersModHandler
- Rename to ShadersModHelper
- Convert methods to check for Iris' and Optifine's presence into static final fields
- Move implementation to impl source set in form of IrisCompat and OptifineCompat classes
- Rename CompatMods to CompatMod and add public field to access mod ID
- Set BlockEntityType's Sodium predicate to null after it is removed
- Update repository links
- Remove local libs repository
---------
Co-authored-by: Jozufozu <jozsefaug@gmail.com>
Co-authored-by: PepperCode1 <44146161+PepperCode1@users.noreply.github.com>
- Use vanilla light directions for diffuse lighting
- Copy mc's glsl code for it, but assume directions are normalized
- Add command/config to toggle use of light directions vs chunk accurate
diffuse
- Always use shade in getItemMaterial
- Do not reload resource packs when updating light smoothness config,
we don't need to anymore with lazy compilation
- Decide not to render entities directly in the renderEntity method
- Prevents allocating large lists every frame to filter entities from
the client level
- Add pick glint material and system time uniform
- Move _FlwCullData to beginning of uniform block to ensure alignment
- Add helper to convert item rendertype into flywheel material
- Make UberShaderComponent#build NotNull
- Move index update and key creation logic to PipelineCompiler
- Always update index when a resource location is requested to fix
MaterialEncoder misses
- Indices trigger pipeline compiler deletion when updated
- Make everything in the compiler chain's results not null
- Throw errors immediately when encountered
- Log error messages when falling back
- Do not eagerly grab utility programs in IndirectDrawManager so we can
actually catch errors and fall back
- Remove CompilerStats
- Remove fog shader registry
- Remove Registry and RegistryImpl
- Make shader indices mutable
- Track fog uber component in a static field in PipelineCompiler
- When a new fog source is added, delete the pipeline compilation
harness and recreate the fog uber component
- Inline SourceLoader
- Strip out almost all source registries
- Fog will be dealt with in a follow-up commit
- Remove most static #init methods
- Remove old ubershader indices from shaders
- Hidden state now tracks the Instance object to keep the handle small
- Make the recreate supplier an explicit record to allow comparisons
- Add setVisible method to Instance
- Use state machine interface in InstanceHandleImpl
- 3 states: deleted, visible, hidden
- Visible is directly implemented by AbstractInstancer
- Hidden stores the instancer supplier to recreate an instancer
- Eagerly load ALL shaders in ShaderSources, resolving imports there
- Compile and cache programs on-demand
- Move gl state try blocks to EngineImpl
- EngineImpl catches shader exceptions and triggers a fallback
- Fix crumbling on indirect
- Directly use the baseInstance as instance index without indirection
- #define base instance and draw id variables to simplify usage
- Fix null pointer looking up culling group
- Add method to map an instancer's local instance index to a global
index in the page file
- Remove ModelHolder and ModelCache
- Remove lib/util.FlwUtil
- Remove lib/util.Pair and replace usages with com.mojang.datafixers.util.Pair
- Remove lib/util.Unit and replace usages with net.minecraft.util.Unit
- Make ResourceReloadHolder and ResourceReloadCache final and move to util
- Clean up code in backend/glsl
- Move LightSmoothnessArgument to impl
- Remove LoweringVisitor
- Move functionality of four main static methods in LoweringVisitor to new ModelTrees class
- Return ModelTree directly
- Accept Material instead of TextureAtlasSprite for efficiency, so visuals don't need to look up the sprite to get the ModelTree
- Use ResourceReloadCache for MeshTree.CACHE
- Implement instance hiding by deleting/stealing
- Work around instancer persistence by storing a recreation supplier in
the instance handle
- Rework instancer ctors to just take an InstancerKey
- Parameterize InstanceHandle by I extends Instance so the steal method
and the supplier can be safely assigned
- IndirectInstancer#uploadInstances: 46% of render thread to 26%
- Inline #enqueueCopy to avoid allocating LongConsumers
- Do not even bother to track individual changed indices, instead rely
on just the changedPage set
- Convert ShulkerBoxVisual to use InstanceTree
- Add "pruning" helper visitors
- Remove ModelPartConverter
- Remove TextureMapper and related code from VertexWriter
- Add ModelTree
- Add LoweringVisitor to traverse a MeshTree and emit ModelTree nodes
and Models
- Provide some default visitor creation methods
- Abstract ModelCache -> ResourceReloadCache
- Abstract ModelHolder -> ResourceReloadHolder
- Add ModelTreeCache to hide lookup cost if it gets extreme
- Use InstanceTrees in BellVisual and MinecartVisual
- Use JOML Matrix4fStack instead of PoseStack
- Directly transform Matrix4f instead of using PoseStack to compute initial pose
- Track if the transforms for an InstanceTree have changed
- Pass a boolean down the tree and || it with our changed flag to force
updates
- Expose the force flag to visuals so they can hint to us if their root
transforms never change
- Add 2 wrapper methods to make the distinction more clear
- Store a pose matrix in each InstanceTree, equivalent to its instance's pose matrix if the instance exists
- Directly transform the current InstanceTree's pose matrix instead of transforming a PoseStack and copying its matrix to the instance, eliminating the need to push and pop stack entries
- Remove InstanceTree.rotation
- Add more InstanceTree methods to allow full inspection of children
- Rename TransformedInstance -> PosedInstance
- Add TransformedInstance, which does not have a normal matrix
- Use mat3(transpose(inverse(i.pose))) in the vertex shader, but for
most cases that will be overkill
- Use arrrays in the trees
- MeshTree tracks parallel arrays of children and keys
- InstanceTree tracks which mesh tree it came from for lookup purposes
- Remove walker object
- Make RecyclingPoseStack use add/removeLast instead of push/pop
- Add MeshTree and InstanceTree
- Deprecate ModelPartConverter for removal
- Refactor ChestVisual to use InstanceTree
- Combine double chest light in ChestVisual
- Cherry-pick misc cleanups from last-frame-visibility
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
pass to bind exactly which buffers it needs
- Use DSA for the depth pyramid
- Pass the map of util programs to IndirectPrograms rather than
unpacking them individually
- Actually delete all the indirect utils
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
IndirectInstancer
- Free pages when an allocation is deleted
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
priority at runtime
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
actually bind the matrix buffer for the cull shader. Not sure how it
worked at all before
- Minor styling/cleanup
- Optimize embeddings on indirect backend by uploading all matrices in
an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
InstanceType, so now all instances from all embeddings get culled in
the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
less GPU time, and we may want to do this for actual user-authored
material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
- Manually unroll all loops in light_lut with the help of macros
- Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
bandwidth required but that turned out much slower. Perhaps there's
still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
consistent with everything else and improve the memory access pattern
- Merge flywheel-backend config into an object within the base flywheel
config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
4x as many shaders
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
when reading the 3x3x3 light volume for a fragment. This saves a
significant amount of memory and integer additions in the shader, but
does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
to manually unroll to be cross-platform
- Remove redundant flw_light
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
divide to avoid the incorrect "AO" effect near the ground
- Expose an ivec3 flw_renderOrigin in the shader api
- Internally add flw_renderOrigin in flw_light(*)
- flw_lightFetch expects an actual world position still
- SimpleQuadMesh holds a reference to its backing MemoryBlock so the
cleaner doesn't drop it
- Fixes issue where meshes suddenly start rendering garbage
- Add material shader component to specify smooth lighting behavior
- Allows much easier composition of smooth lighting/material shader
effects, and potentially gives backends the option to specialize
shaders on the complexity of shader lighting
- Pack fog, cutout, and light into a single uint