- Use arrrays in the trees
- MeshTree tracks parallel arrays of children and keys
- InstanceTree tracks which mesh tree it came from for lookup purposes
- Remove walker object
- Make RecyclingPoseStack use add/removeLast instead of push/pop
- Add MeshTree and InstanceTree
- Deprecate ModelPartConverter for removal
- Refactor ChestVisual to use InstanceTree
- Combine double chest light in ChestVisual
- Cherry-pick misc cleanups from last-frame-visibility
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
pass to bind exactly which buffers it needs
- Use DSA for the depth pyramid
- Pass the map of util programs to IndirectPrograms rather than
unpacking them individually
- Actually delete all the indirect utils
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
IndirectInstancer
- Free pages when an allocation is deleted
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Optimize read visibility by having each invocation read a 2x2 area and
coalescing atomicOrs when all 4 texels are equal
- Also use the fancy remap function for better texture cache locality
- Invert image size on CPU to avoid divisions on GPU
- Increase depth reduce group size to 16x16
- Early-out in uploadInstances based on changed cardinality
- Much faster to calculate cardinality than it is to clear an
AtomicBitSet, so the check is worth it
- Upload scatter list directly in the staging buffer if there's room
- Fix crash by resetting the indirect dispatch buffer each frame
- Use DSA + immutable storage for depth pyramid and visibility buffer
- In pass two, check against the thread count written out in pass one to
early return
- Require a draw barrier after each apply dispatch
- Use a storage array for the last frame visibility buffer
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
pass to bind exactly which buffers it needs
- Add stub dispatchCullPassTwo to IndirectCullingGroup
- Add pass two buffers to IndirectBuffers
- Actually compile and run visibility read shader
- Clear the visbuffer and readbuffer each frame
- Track culling group page counts between frames
- Fix texture binding issues between visbuffer and depth pyramid
- Add early and late cull shaders
- Compile early and late shaders separately
- Move util shader list to a static field
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
IndirectInstancer
- Free pages when an allocation is deleted
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
- Goal: avoid needing to re-upload everything when instance count for
one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
changes
- Instancers will not necessarily store instances contiguously anymore,
but that's okay because any given cull workgroup will only reference a
single page
- Culling threads *will* write instances contiguously however, and so we
still need to keep track of a base instance per instancer, and the
target buffer logic does not change
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
priority at runtime
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
actually bind the matrix buffer for the cull shader. Not sure how it
worked at all before
- Minor styling/cleanup
- Optimize embeddings on indirect backend by uploading all matrices in
an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
InstanceType, so now all instances from all embeddings get culled in
the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
less GPU time, and we may want to do this for actual user-authored
material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
- Manually unroll all loops in light_lut with the help of macros
- Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
bandwidth required but that turned out much slower. Perhaps there's
still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
consistent with everything else and improve the memory access pattern
- Merge flywheel-backend config into an object within the base flywheel
config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
4x as many shaders
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
when reading the 3x3x3 light volume for a fragment. This saves a
significant amount of memory and integer additions in the shader, but
does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
to manually unroll to be cross-platform
- Remove redundant flw_light
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
divide to avoid the incorrect "AO" effect near the ground
- Expose an ivec3 flw_renderOrigin in the shader api
- Internally add flw_renderOrigin in flw_light(*)
- flw_lightFetch expects an actual world position still
- SimpleQuadMesh holds a reference to its backing MemoryBlock so the
cleaner doesn't drop it
- Fixes issue where meshes suddenly start rendering garbage
- Add material shader component to specify smooth lighting behavior
- Allows much easier composition of smooth lighting/material shader
effects, and potentially gives backends the option to specialize
shaders on the complexity of shader lighting
- Pack fog, cutout, and light into a single uint
- Extend InstancerProvider to allow visuals to bias the render order of
their instancers
- Keep the old InstancerProvider#instancer method with a bias of 0
- Add an explanation of render order in InstancerProvider
* Start on general API formalization
* More API improvements
- Add Engine#onLightUpdate; remove LightUpdateHolder and backend/ClientChunkCacheMixin
- Add Effect#level
- Add VisualizationHelper#queueAdd and #queueRemove for Effects
- Fix PartialModel not assigning bakedModel field when populating on init
- Fix PartialModel.ALL using weak keys instead of weak values
- Make Simple*Visualizer and corresponding inner Builder classes final
- Restore FlatLit#light overload that accepts block and sky light values separately
- Add AbstractBlockEntityVisual#relight overloads that accept Iterator and Iterable
- Reorganize classes in impl.vizualization
* TaskExecutor simplification
- Move TaskExecutor#sync* methods to TaskExecutorImpl
- Move Flag and RaisePlan to impl
- Remove TaskExecutor#scheduleForMainThread and #isMainThread methods
- Remove SyncedPlan
- Add Engine#setupRender
- Remove TaskExecutor parameters from Engine#render* methods
- Convert Engine$CrumblingBlock into an interface
- Unmark RenderContext as NonExtendable to allow fulfilling the purpose described in the doc of VisualizationManager#renderDispatcher
* Remove registry freeze callbacks
- Lazily initialize MaterialShaderIndices
- Rename MaterialShaders#*Shader to #*Source
- Move BackendImplemented to api.backend package
- Ensure section set returned by SectionTracker is Unmodifiable to avoid copy in LightUpdatedVisualStorage
- Do not recompute section set in ShaderLightVisualStorage if not dirty
- Fix BlockEntityStorage not clearing posLookup on recreation or invalidation
- Fix Storage.invalidate not clearing everything
- Inline TopLevelEmbeddedEnvironment and NestedEmbeddedEnvironment into AbstractEmbeddedEnvironment and rename to EmbeddedEnvironment
- Move some classes between packages
- Remove unused fields in EmbeddingUniforms
- Remove suffix on field names in BufferBindings
- Rename enqueueLightUpdateSection methods to onLightUpdate
- Rename SectionCollectorImpl to SectionTracker
- Rename classes, methods, fields, and parameters and edit javadoc and comments to match previously done renames, new renames, and other existing classes
- Optimize collecting light section edges
- Kinda an absurd amount of code, but I'm not sure how to parameterize
by an axis without having capturing lambdas
- Around 3-4x faster
- Only push light sections to the engine when the set of sections
requested by visuals changes
- Clean up light storage plan and comment code
- Remove LIGHT_VOLUME debug mode as it's no longer used
- Not attached to the name
- Add SmoothLitVisual opt in interface, allowing any visuals to
contribute light sections to the arena
- Remove lightChunks from VisualEmbedding, it has been usurped
- Pass total collected light sections from BEs, Es, and effects to the
engine interface. It seemed the most proper way to hand off
information from the impl to the backend
- Add SmoothLitVisualStorage to maintain the set of collected sections,
though at the moment it is very naive and simply unions everything
upon request, which is also naively done every frame
- Expose light in the shader api
- flw_light - for builtin smooth lighting, faster than can be
implemented by materials alone
- flw_lightFetch - for materials that want to go crazy, access to raw
data
- Sideport light lut stuffs to instancing engine
- Move actual lookup logic to light_lut.glsl, and have backend mains
provide functions to index the backing storages for sanity's sake
- Standardize naming of lut and sections
- Pull in pepper's loom fix, so I can build :lwe:
- Allow specifying the internal format of texture buffers so light can
be a simple uint array
- Pass light updates to LightStorage so that we don't have to re-upload
every tracked section every frame
- Slightly optimize light section writing, still room for improvement
- Remove dead code in LightStorage
- Avoid adding all sections every frame
- Remove sections when they are no longer needed
- Rebuild the lut when sections are removed
- Properly detect missing sections by writing 1-based indices to the lut
- "Functional" arena based lighting for indirect
- Strip out most of the reference counting stuffs for embeddings
- Naively re-buffer all tracked light sections every frame
- Pass partial tick to visualizers and Effect#visualize
- Pass partial tick to LitVisual#updateLight
- Remove Visual#init
- Rename LitVisual#initLightSectionNotifier to setLightSectionNotifier
- Add static utility methods to FlatLit
- Remove relight methods from AbstractVisual and add specialized relight methods to AbstractBlockEntityVisual and AbstractEntityVisual to match how vanilla retrieves lightmaps
- Rename AtomicBitset to AtomicBitSet
- Visualizers return a list of visuals instead of just one
- Simple*Visualizer's builders still only allow one visual per object,
I'm not sure how best to expose adding multiple in a way that allows
other mods to extend existing visualizers
Meshes are now always sorted by chunk layer first, then in order of how the BakedModel returned quads. This should exactly match vanilla's chunk buffering and avoid any rendering issues.
- Move registry freezing to right before start of initial resource reload
- Also warn if Fabric config JSON is not an object
- Move Flywheel.java to API
- Remove Flywheel.LOGGER and others; add impl-specific and backend-specific loggers
- Remove unused mixins
- Organize imports
- Use immutable lists for backend extensions
- Copy the contents of indirect base extensions into compute extensions
- Move uniform block binding to Uniforms class and make magic strings
static final
- Create separate remap tasks for the common mojmap api to get loom to
populate its manifest file
- Add helper method to fork a JarTaskSet and generate new remap tasks
- Remove non-remap publish method
- Set assemble to depend on published jar tasks again
- Add companion methods to create individual jar tasks
- Merge package-infos, jar-sets, and transitive-source-sets plugins
- Move publishing logic into JarTaskSet
- Do not eagerly add all jarsets to assemble
- Significantly reduces build times
- Add separate helper method for creating outgoing jarsets
- Compile with progressively lower glsl versions to test which are
available.
- Properly returns 460 on my machine even with a gl version of 320
- Remove glsl enums below 150
- Refine capabilities checks
- To compile with glsl 150:
- Instancing needs ARB_shader_bit_encoding
- Indirect needs gpu_shader5 and shading_language_420pack
- Require extensions instead of just enable, probably doesn't make a
difference since we check for their presence first but require aligns
with our intent better
- Upgrade buildSrc to kotlin buildscript
- Add TransitiveSourceSet extension to abstract creating the different
source sets and creating the configurations to apply dependencies
- Convert package infos stuff to a proper plugin
- Add extension for specifying which source sets get generated package
infos
- Move extension classes into their own files
- Move GeneratePackageInfosTask into com.jozufozu.gradle
- Only publish api/lib and full jars for platforms
- Do not eagerly create outgoing configurations for JarTaskSet
- Add method to create outgoing configuration
- Clean up JarTaskSet creation and move some string constants to statics
- Set @CompileStatic
- Remove ${name} classifier prefix from JarTaskSet jars
- Put JarTaskSet jars into subdirectories in the output folders
- Add configure and configureEach methods to JarTaskSet
- Rename OutgoingConfigurationPlugin -> JarSetPlugin and move extension
to root module so idea offers completions in gradle files
- Standardize artifact naming flywheel-$platform(-api)?-$mcversion
- Also fix crash buffering fluids in BakedModelBufferer#bufferMultiBlock (Forge)
- The mesh order in models created by model builders is currently incorrect and will be fixed later
- Upgrade platform script plugin to pre-compiled groovy plugin
- It was getting really difficult to manage all the logic/plugins/types
from the basic script, and implementing a real plugin gives us much
better type safety and IDE access to upstream plugins
- Separate api/lib/backend/impl in platform projects
- Add platform module output to main runtime classpath so the fabric
loader recognizes our additional modules
- Use convention plugins for common build logic
- Convention plugins get to be applied in the plugins block and can load
other plugins
- Move GeneratePackageInfosTask to its own file in buildSrc
- Apply GeneratePackageInfosTask to every sourceSet
- Separate common project into 4 source sets
- Declare outgoing configurations for forge/fabric to depend on
- Re-compile source from each source set in each platform's compileJava