- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
pass to bind exactly which buffers it needs
- Add stub dispatchCullPassTwo to IndirectCullingGroup
- Add pass two buffers to IndirectBuffers
- Actually compile and run visibility read shader
- Clear the visbuffer and readbuffer each frame
- Track culling group page counts between frames
- Fix texture binding issues between visbuffer and depth pyramid
- Add early and late cull shaders
- Compile early and late shaders separately
- Move util shader list to a static field
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
IndirectInstancer
- Free pages when an allocation is deleted
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
priority at runtime
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
actually bind the matrix buffer for the cull shader. Not sure how it
worked at all before
- Minor styling/cleanup
- Optimize embeddings on indirect backend by uploading all matrices in
an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
InstanceType, so now all instances from all embeddings get culled in
the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
less GPU time, and we may want to do this for actual user-authored
material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
- Manually unroll all loops in light_lut with the help of macros
- Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
bandwidth required but that turned out much slower. Perhaps there's
still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
consistent with everything else and improve the memory access pattern
- Merge flywheel-backend config into an object within the base flywheel
config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
4x as many shaders
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
when reading the 3x3x3 light volume for a fragment. This saves a
significant amount of memory and integer additions in the shader, but
does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
to manually unroll to be cross-platform
- Remove redundant flw_light
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
divide to avoid the incorrect "AO" effect near the ground
- Expose an ivec3 flw_renderOrigin in the shader api
- Internally add flw_renderOrigin in flw_light(*)
- flw_lightFetch expects an actual world position still
- SimpleQuadMesh holds a reference to its backing MemoryBlock so the
cleaner doesn't drop it
- Fixes issue where meshes suddenly start rendering garbage
- Add material shader component to specify smooth lighting behavior
- Allows much easier composition of smooth lighting/material shader
effects, and potentially gives backends the option to specialize
shaders on the complexity of shader lighting
- Pack fog, cutout, and light into a single uint
- Extend InstancerProvider to allow visuals to bias the render order of
their instancers
- Keep the old InstancerProvider#instancer method with a bias of 0
- Add an explanation of render order in InstancerProvider
* Start on general API formalization
* More API improvements
- Add Engine#onLightUpdate; remove LightUpdateHolder and backend/ClientChunkCacheMixin
- Add Effect#level
- Add VisualizationHelper#queueAdd and #queueRemove for Effects
- Fix PartialModel not assigning bakedModel field when populating on init
- Fix PartialModel.ALL using weak keys instead of weak values
- Make Simple*Visualizer and corresponding inner Builder classes final
- Restore FlatLit#light overload that accepts block and sky light values separately
- Add AbstractBlockEntityVisual#relight overloads that accept Iterator and Iterable
- Reorganize classes in impl.vizualization
* TaskExecutor simplification
- Move TaskExecutor#sync* methods to TaskExecutorImpl
- Move Flag and RaisePlan to impl
- Remove TaskExecutor#scheduleForMainThread and #isMainThread methods
- Remove SyncedPlan
- Add Engine#setupRender
- Remove TaskExecutor parameters from Engine#render* methods
- Convert Engine$CrumblingBlock into an interface
- Unmark RenderContext as NonExtendable to allow fulfilling the purpose described in the doc of VisualizationManager#renderDispatcher
* Remove registry freeze callbacks
- Lazily initialize MaterialShaderIndices
- Rename MaterialShaders#*Shader to #*Source
- Move BackendImplemented to api.backend package
- Ensure section set returned by SectionTracker is Unmodifiable to avoid copy in LightUpdatedVisualStorage
- Do not recompute section set in ShaderLightVisualStorage if not dirty
- Fix BlockEntityStorage not clearing posLookup on recreation or invalidation
- Fix Storage.invalidate not clearing everything
- Inline TopLevelEmbeddedEnvironment and NestedEmbeddedEnvironment into AbstractEmbeddedEnvironment and rename to EmbeddedEnvironment
- Move some classes between packages
- Remove unused fields in EmbeddingUniforms
- Remove suffix on field names in BufferBindings
- Rename enqueueLightUpdateSection methods to onLightUpdate
- Rename SectionCollectorImpl to SectionTracker
- Rename classes, methods, fields, and parameters and edit javadoc and comments to match previously done renames, new renames, and other existing classes
- Optimize collecting light section edges
- Kinda an absurd amount of code, but I'm not sure how to parameterize
by an axis without having capturing lambdas
- Around 3-4x faster
- Only push light sections to the engine when the set of sections
requested by visuals changes
- Clean up light storage plan and comment code
- Remove LIGHT_VOLUME debug mode as it's no longer used
- Not attached to the name
- Add SmoothLitVisual opt in interface, allowing any visuals to
contribute light sections to the arena
- Remove lightChunks from VisualEmbedding, it has been usurped
- Pass total collected light sections from BEs, Es, and effects to the
engine interface. It seemed the most proper way to hand off
information from the impl to the backend
- Add SmoothLitVisualStorage to maintain the set of collected sections,
though at the moment it is very naive and simply unions everything
upon request, which is also naively done every frame
- Expose light in the shader api
- flw_light - for builtin smooth lighting, faster than can be
implemented by materials alone
- flw_lightFetch - for materials that want to go crazy, access to raw
data
- Sideport light lut stuffs to instancing engine
- Move actual lookup logic to light_lut.glsl, and have backend mains
provide functions to index the backing storages for sanity's sake
- Standardize naming of lut and sections
- Pull in pepper's loom fix, so I can build :lwe:
- Allow specifying the internal format of texture buffers so light can
be a simple uint array
- Pass light updates to LightStorage so that we don't have to re-upload
every tracked section every frame
- Slightly optimize light section writing, still room for improvement
- Remove dead code in LightStorage
- Avoid adding all sections every frame
- Remove sections when they are no longer needed
- Rebuild the lut when sections are removed
- Properly detect missing sections by writing 1-based indices to the lut
- "Functional" arena based lighting for indirect
- Strip out most of the reference counting stuffs for embeddings
- Naively re-buffer all tracked light sections every frame
- Pass partial tick to visualizers and Effect#visualize
- Pass partial tick to LitVisual#updateLight
- Remove Visual#init
- Rename LitVisual#initLightSectionNotifier to setLightSectionNotifier
- Add static utility methods to FlatLit
- Remove relight methods from AbstractVisual and add specialized relight methods to AbstractBlockEntityVisual and AbstractEntityVisual to match how vanilla retrieves lightmaps
- Rename AtomicBitset to AtomicBitSet
- Visualizers return a list of visuals instead of just one
- Simple*Visualizer's builders still only allow one visual per object,
I'm not sure how best to expose adding multiple in a way that allows
other mods to extend existing visualizers