- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
priority at runtime
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
actually bind the matrix buffer for the cull shader. Not sure how it
worked at all before
- Minor styling/cleanup
- Optimize embeddings on indirect backend by uploading all matrices in
an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
InstanceType, so now all instances from all embeddings get culled in
the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
less GPU time, and we may want to do this for actual user-authored
material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
- Manually unroll all loops in light_lut with the help of macros
- Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
bandwidth required but that turned out much slower. Perhaps there's
still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
consistent with everything else and improve the memory access pattern
- Merge flywheel-backend config into an object within the base flywheel
config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
4x as many shaders
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
when reading the 3x3x3 light volume for a fragment. This saves a
significant amount of memory and integer additions in the shader, but
does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
to manually unroll to be cross-platform
- Remove redundant flw_light
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
divide to avoid the incorrect "AO" effect near the ground
- Expose an ivec3 flw_renderOrigin in the shader api
- Internally add flw_renderOrigin in flw_light(*)
- flw_lightFetch expects an actual world position still
- SimpleQuadMesh holds a reference to its backing MemoryBlock so the
cleaner doesn't drop it
- Fixes issue where meshes suddenly start rendering garbage
- Add material shader component to specify smooth lighting behavior
- Allows much easier composition of smooth lighting/material shader
effects, and potentially gives backends the option to specialize
shaders on the complexity of shader lighting
- Pack fog, cutout, and light into a single uint
- Extend InstancerProvider to allow visuals to bias the render order of
their instancers
- Keep the old InstancerProvider#instancer method with a bias of 0
- Add an explanation of render order in InstancerProvider
* Start on general API formalization
* More API improvements
- Add Engine#onLightUpdate; remove LightUpdateHolder and backend/ClientChunkCacheMixin
- Add Effect#level
- Add VisualizationHelper#queueAdd and #queueRemove for Effects
- Fix PartialModel not assigning bakedModel field when populating on init
- Fix PartialModel.ALL using weak keys instead of weak values
- Make Simple*Visualizer and corresponding inner Builder classes final
- Restore FlatLit#light overload that accepts block and sky light values separately
- Add AbstractBlockEntityVisual#relight overloads that accept Iterator and Iterable
- Reorganize classes in impl.vizualization
* TaskExecutor simplification
- Move TaskExecutor#sync* methods to TaskExecutorImpl
- Move Flag and RaisePlan to impl
- Remove TaskExecutor#scheduleForMainThread and #isMainThread methods
- Remove SyncedPlan
- Add Engine#setupRender
- Remove TaskExecutor parameters from Engine#render* methods
- Convert Engine$CrumblingBlock into an interface
- Unmark RenderContext as NonExtendable to allow fulfilling the purpose described in the doc of VisualizationManager#renderDispatcher
* Remove registry freeze callbacks
- Lazily initialize MaterialShaderIndices
- Rename MaterialShaders#*Shader to #*Source
- Move BackendImplemented to api.backend package
- Ensure section set returned by SectionTracker is Unmodifiable to avoid copy in LightUpdatedVisualStorage
- Do not recompute section set in ShaderLightVisualStorage if not dirty
- Fix BlockEntityStorage not clearing posLookup on recreation or invalidation
- Fix Storage.invalidate not clearing everything
- Inline TopLevelEmbeddedEnvironment and NestedEmbeddedEnvironment into AbstractEmbeddedEnvironment and rename to EmbeddedEnvironment
- Move some classes between packages
- Remove unused fields in EmbeddingUniforms
- Remove suffix on field names in BufferBindings
- Rename enqueueLightUpdateSection methods to onLightUpdate
- Rename SectionCollectorImpl to SectionTracker
- Rename classes, methods, fields, and parameters and edit javadoc and comments to match previously done renames, new renames, and other existing classes
- Optimize collecting light section edges
- Kinda an absurd amount of code, but I'm not sure how to parameterize
by an axis without having capturing lambdas
- Around 3-4x faster
- Only push light sections to the engine when the set of sections
requested by visuals changes
- Clean up light storage plan and comment code
- Remove LIGHT_VOLUME debug mode as it's no longer used
- Not attached to the name
- Add SmoothLitVisual opt in interface, allowing any visuals to
contribute light sections to the arena
- Remove lightChunks from VisualEmbedding, it has been usurped
- Pass total collected light sections from BEs, Es, and effects to the
engine interface. It seemed the most proper way to hand off
information from the impl to the backend
- Add SmoothLitVisualStorage to maintain the set of collected sections,
though at the moment it is very naive and simply unions everything
upon request, which is also naively done every frame
- Expose light in the shader api
- flw_light - for builtin smooth lighting, faster than can be
implemented by materials alone
- flw_lightFetch - for materials that want to go crazy, access to raw
data
- Sideport light lut stuffs to instancing engine
- Move actual lookup logic to light_lut.glsl, and have backend mains
provide functions to index the backing storages for sanity's sake
- Standardize naming of lut and sections
- Pull in pepper's loom fix, so I can build :lwe:
- Allow specifying the internal format of texture buffers so light can
be a simple uint array
- Pass light updates to LightStorage so that we don't have to re-upload
every tracked section every frame
- Slightly optimize light section writing, still room for improvement
- Remove dead code in LightStorage
- Avoid adding all sections every frame
- Remove sections when they are no longer needed
- Rebuild the lut when sections are removed
- Properly detect missing sections by writing 1-based indices to the lut
- "Functional" arena based lighting for indirect
- Strip out most of the reference counting stuffs for embeddings
- Naively re-buffer all tracked light sections every frame
- Pass partial tick to visualizers and Effect#visualize
- Pass partial tick to LitVisual#updateLight
- Remove Visual#init
- Rename LitVisual#initLightSectionNotifier to setLightSectionNotifier
- Add static utility methods to FlatLit
- Remove relight methods from AbstractVisual and add specialized relight methods to AbstractBlockEntityVisual and AbstractEntityVisual to match how vanilla retrieves lightmaps
- Rename AtomicBitset to AtomicBitSet
- Visualizers return a list of visuals instead of just one
- Simple*Visualizer's builders still only allow one visual per object,
I'm not sure how best to expose adding multiple in a way that allows
other mods to extend existing visualizers
Meshes are now always sorted by chunk layer first, then in order of how the BakedModel returned quads. This should exactly match vanilla's chunk buffering and avoid any rendering issues.
- Move registry freezing to right before start of initial resource reload
- Also warn if Fabric config JSON is not an object
- Move Flywheel.java to API
- Remove Flywheel.LOGGER and others; add impl-specific and backend-specific loggers
- Remove unused mixins
- Organize imports