Commit graph

868 commits

Author SHA1 Message Date
Jozufozu
41e0aa6bba Strike a pose
- Rename TransformedInstance -> PosedInstance
- Add TransformedInstance, which does not have a normal matrix
- Use mat3(transpose(inverse(i.pose))) in the vertex shader, but for
  most cases that will be overkill
2024-09-14 21:46:49 -07:00
Jozufozu
6f2b8fd3fb Shiver me timbers
- Use arrrays in the trees
- MeshTree tracks parallel arrays of children and keys
- InstanceTree tracks which mesh tree it came from for lookup purposes
- Remove walker object
- Make RecyclingPoseStack use add/removeLast instead of push/pop
2024-09-14 19:38:55 -07:00
Jozufozu
5ffddeb221 Walk this way
- Update instances using a walker object created at the top level
2024-09-14 17:46:54 -07:00
Jozufozu
ed35c5a429 Reduce and reuse
- Create subclass to recycle PoseStack.Pose objects
- Add mixin/liblink to access a PoseStack's inner deque
2024-09-14 16:50:02 -07:00
PepperCode1
295ebc7573 Treefactors
- Add MeshTree and InstanceTree
- Deprecate ModelPartConverter for removal
- Refactor ChestVisual to use InstanceTree
- Combine double chest light in ChestVisual
2024-09-14 15:00:27 -07:00
Jozufozu
ddb0450105 Rapid descent
- Implement single (but actually 2) pass downsampling
2024-09-14 14:12:14 -07:00
Jozufozu
a527af513f Harvesting
- Cherry-pick misc cleanups from last-frame-visibility
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
  pass to bind exactly which buffers it needs
- Use DSA for the depth pyramid
- Pass the map of util programs to IndirectPrograms rather than
  unpacking them individually
- Actually delete all the indirect utils
2024-09-14 14:04:50 -07:00
Jozufozu
f009cb846c Near stability
- Fix near plane rejection logic
- Fix lod clamp
2024-09-14 12:55:05 -07:00
Jozufozu
6c1fbf610d The depths of the rabbit hole
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
2024-09-14 12:55:04 -07:00
Jozufozu
01a7936a05 Joining the occult
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
2024-09-14 12:55:03 -07:00
Jozufozu
81cb2340e7 The hardest problem
- Rename most InstancePager terminology
- Rename MODEL_INDEX buffer stuffs
2024-09-14 12:55:03 -07:00
Jozufozu
b5680a0fd6 On-call paging
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
  IndirectInstancer
- Free pages when an allocation is deleted
2024-09-14 12:55:02 -07:00
Jozufozu
637f0538fc Growing pains
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
2024-09-14 12:55:01 -07:00
Jozufozu
0af1127745 Paging Dr. Instancer
- Goal: avoid needing to re-upload everything when instance count for
  one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
  page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
  changes
- Instancers will not necessarily store instances contiguously anymore,
  but that's okay because any given cull workgroup will only reference a
  single page
- Culling threads *will* write instances contiguously however, and so we
  still need to keep track of a base instance per instancer, and the
  target buffer logic does not change
2024-09-14 12:55:01 -07:00
Jozufozu
ba3d84b5ae Seeing blue
- Optimize read visibility by having each invocation read a 2x2 area and
  coalescing atomicOrs when all 4 texels are equal
- Also use the fancy remap function for better texture cache locality
2024-09-13 22:40:10 -07:00
Jozufozu
0151364b8a Clear for debugging
- Nsight explodes with the scatter shader resetting the indirect
  dispatch buffer
- Instead, issue a clear buffer and buffer update barrier
2024-09-13 22:40:10 -07:00
Jozufozu
861009ed11 Rapid descent
- Implement single (but actually 2) pass downsampling
2024-09-13 22:40:09 -07:00
Jozufozu
dfc1e3a397 Looming danger
- Bump arch loom and gradle versions
- Do not set default refmap name
- Enable legacy mixin ap in loom
- Individually add sourcesets to looms refmap stuffs
2024-09-09 21:28:02 -07:00
Jozufozu
0bfaac7154 Poking and prodding
- Invert image size on CPU to avoid divisions on GPU
- Increase depth reduce group size to 16x16
- Early-out in uploadInstances based on changed cardinality
  - Much faster to calculate cardinality than it is to clear an
    AtomicBitSet, so the check is worth it
- Upload scatter list directly in the staging buffer if there's room
2024-09-09 20:39:10 -07:00
Rhys⁣⁣⁣⁣⁣⁣⁣
fb41248c4c
fix loom version getting (#254)
* fix loom version getting
very cursed but it works :)

* remove commented out stuff in settings.gradle.kts, unnecessary plugins in buildSrc, and configure buildSrc to download sources/javadoc
2024-09-09 19:11:05 -07:00
PepperCode1
14ca1d3286 Allow VertexViews to hold a memory owner
- Remove MemoryBlock parameter from SimpleQuadMesh constructors
2024-09-09 19:01:19 -07:00
Jozufozu
f12aa15dae It's alive
- Fix crash by resetting the indirect dispatch buffer each frame
- Use DSA + immutable storage for depth pyramid and visibility buffer
- In pass two, check against the thread count written out in pass one to
  early return
- Require a draw barrier after each apply dispatch
- Use a storage array for the last frame visibility buffer
2024-09-09 14:20:25 -07:00
Jozufozu
4552716b74 Error count decreasing
- Mostly silly typos
- Do not clear the visbuffer if it hasn't been generated yet
- Grow pass two index buffer instead of pass two dispatch buffer
2024-09-08 12:33:25 -07:00
Jozufozu
b6ed3cefda Probably not rendering
- Flesh out two pass pipeline
- Shove everything into one visual type for now
2024-09-08 11:29:27 -05:00
Jozufozu
1edb72ac19 Buff to the buffers
- Smarter multibind logic
- Make offsets in IndirectBuffers dependent on BufferBindings
- Organize buffer bindings based on where they're used to allow each
  pass to bind exactly which buffers it needs
- Add stub dispatchCullPassTwo to IndirectCullingGroup
- Add pass two buffers to IndirectBuffers
2024-09-08 11:00:04 -05:00
Jozufozu
9009bfe730 Observe
- Actually compile and run visibility read shader
- Clear the visbuffer and readbuffer each frame
- Track culling group page counts between frames
- Fix texture binding issues between visbuffer and depth pyramid
- Add early and late cull shaders
- Compile early and late shaders separately
- Move util shader list to a static field
2024-09-08 09:57:11 -05:00
Jozufozu
77d64aa5a2 I can see you
- Add visibility buffer fbo attachment
- Write instance id to visbuffer
- Move instance id in/out from common to impl shaders
2024-09-06 15:49:32 -05:00
Jozufozu
ce51e1f534 Near stability
- Fix near plane rejection logic
- Fix lod clamp
2024-09-05 13:45:24 -05:00
Jozufozu
074ee34dd4 The depths of the rabbit hole
- Fix mip levels being half the size they should be
- Use the next lowest po2 from the main render target size for mip 0
- Map from dst texel to src texel rather than naively multiply by 2
- Clamp the estimated mip level in the cull shader
- Use texel fetches in the cull shader (not sure if necessary?)
2024-09-05 12:38:05 -05:00
Jozufozu
ec45287cfa Joining the occult
- Implement hi-z occlusion culling
- Generate depth pyramid just before issuing cull dispatches
- Currently use raw texel fetches but this may be causing loss
- Add _flw_cullData to frame uniforms
2024-09-04 14:02:28 -05:00
Jozufozu
2537584a22 The hardest problem
- Rename most InstancePager terminology
- Rename MODEL_INDEX buffer stuffs
2024-09-03 11:23:21 -05:00
Jozufozu
e83a308a46 On-call paging
- Only update the page table when an allocation is resized
- Only upload the page table after it's uploaded
- Combine various setters for InstancePager.Allocation and
  IndirectInstancer
- Free pages when an allocation is deleted
2024-09-03 10:56:46 -05:00
Jozufozu
1138208e31 Growing pains
- Fix bit logic on the GPU
- Manually manage the size of the storage and pageTable buffers
- Make object2Page and page2Object static
- Fix instance writing loop
- Fix page table always having full pages
- Fix allocations not shrinking
2024-09-01 12:44:38 -05:00
Jozufozu
12c7cdfda5 Paging Dr. Instancer
- Goal: avoid needing to re-upload everything when instance count for
  one instancer changes
- Solution: store instances in pages of 32
- Allocate pages in a GPU arena
- Store one uint per page to indicate which model the instances in the
  page belong to, and how many instances are actually stored in the page
- Instancers eagerly allocate and free pages as their instance count
  changes
- Instancers will not necessarily store instances contiguously anymore,
  but that's okay because any given cull workgroup will only reference a
  single page
- Culling threads *will* write instances contiguously however, and so we
  still need to keep track of a base instance per instancer, and the
  target buffer logic does not change
2024-09-01 12:41:42 -05:00
Jozufozu
1a8ed8db28 Keeping our priorities straight
- Implement backend priority system
- Give indirect priority 1000 and instancing 500
- Generate the sorted list of backends on demand in case one changes
  priority at runtime
2024-08-15 21:45:14 -07:00
Jozufozu
a5f49c6738 Hol up
- Use way fewer memory barriers
- I didn't realize that GL_SHADER_STORAGE_BARRIER_BIT was global instead
  of operating only on the currently bound buffers. Oh, well
- Move apply program binding to IndirectDrawManager
- Fix embedded instances flickering when first loading a world. Need to
  actually bind the matrix buffer for the cull shader. Not sure how it
  worked at all before
- Minor styling/cleanup
2024-08-15 20:57:28 -07:00
Jozufozu
7a7d58adf2 Embeds your embeddings
- Optimize embeddings on indirect backend by uploading all matrices in
  an SSBO
- Allocate matrices in an arena
- Flatten IndirectCullingGroups to only be parameterized by
  InstanceType, so now all instances from all embeddings get culled in
  the same dispatch
- Sort indirect draws by whether they're embedded before anything else
- Include an "embedded" boolean in the MultiDraw record to decide which
  shader to use
- Include "matrixIndex" field in model descriptor and indirect draw
  structs
- Use matrixIndex == 0 to indicate that a matrix is the identity to
  avoid unnecessary work in the cull shader
- Add helper to write a mat3 as 3 vec4s
2024-08-15 11:41:33 -07:00
Jozufozu
b7d2b2ac7c Ubern't
- De-uberify the light shader
- Remove lightSources index
- Include LightShader in PipelineProgramKey and parameterize the
  pipeline fragment shader by it
- Profiling suggests that specializing the shaders uses significantly
  less GPU time, and we may want to do this for actual user-authored
  material shaders (and cutout?) as well
- Sort LightShader highest in the material comparator
- Implement a materialEquals method so IndirectCullingGroup can bucket
  draws on more that just material reference equality
- Do not store any particular draw program in IndirectCullingGroup
2024-08-12 17:35:31 -07:00
Jozufozu
2d37c3894d We (un)roll
- Manually unroll all loops in light_lut with the help of macros
  - Pretty significant perf gains on my 5600G
- I tried assembling a bitmask of the blocks we actually want to fetch
  and branching in each _FLW_LIGHT_FETCH in an attempt to reduce the
  bandwidth required but that turned out much slower. Perhaps there's
  still some middle-ground to be found for axis-aligned normals
- Re-order the 8-arrays in _flw_lightForDirection to be xzy to be
  consistent with everything else and improve the memory access pattern
2024-08-12 15:37:29 -07:00
Jozufozu
76a4b35ce6 Assimilate Backend Config
- Merge flywheel-backend config into an object within the base flywheel
  config
- On forge, push a path in the toml
- On fabric, serialize a nested json object
- Still expose the BackendConfig via FlwBackendXplat, but have the impl
  set a static field in the xplat impl
- Revert debug shulker box changes in previous commit
2024-08-10 12:54:15 -07:00
Jozufozu
744c40a56a Installing a light switch
- Guard 3 different flw_light impls via #define
- Guard the inner face correction behind another #define
- Add LightSmoothness enum to decide which flw_light impl to use
- Make LightSmoothness configurable via a new BackendConfig
- Add command to switch LightSmoothness on the fly
- Note: currently requires a resource reload so we don't need to compile
  4x as many shaders
2024-08-09 14:45:17 -07:00
Jozufozu
3dc4cf0841 All okay
- Use struct to separate light and ao fields
- Move light config TODO to impl
- Fix formatting in InstancerProvider docs
2024-08-01 12:14:14 -07:00
Jozufozu
3692dbdf3c Ayo!
- Calculate AO in flw_light
- Pack block light, sky light, and valid block count into a single uint
  when reading the 3x3x3 light volume for a fragment. This saves a
  significant amount of memory and integer additions in the shader, but
  does require slightly more alu ops overall for the packing/unpacking
- Add pragma optionNV (unroll all) for significant perf boost, may want
  to manually unroll to be cross-platform
- Remove redundant flw_light
2024-08-01 11:12:47 -07:00
Jozufozu
601b70704a Solidly lit
- Upload a bitset for each section indicating if blocks are solid
- When interpolating light, count the number of transparent blocks and
  divide to avoid the incorrect "AO" effect near the ground
2024-07-28 15:57:05 -07:00
Jozufozu
8dce80ba61 The origin of all your problems
- Expose an ivec3 flw_renderOrigin in the shader api
- Internally add flw_renderOrigin in flw_light(*)
- flw_lightFetch expects an actual world position still
2024-07-28 13:16:50 -07:00
Jozufozu
fe89e0024a Courier transformed
- SimpleQuadMesh holds a reference to its backing MemoryBlock so the
  cleaner doesn't drop it
- Fixes issue where meshes suddenly start rendering garbage
2024-07-27 18:00:44 -07:00
Jozufozu
69411fb36f Uber smooth
- Add material shader component to specify smooth lighting behavior
- Allows much easier composition of smooth lighting/material shader
  effects, and potentially gives backends the option to specialize
  shaders on the complexity of shader lighting
- Pack fog, cutout, and light into a single uint
2024-07-27 15:31:08 -07:00
Jozufozu
cdc68244e7 An original thought
- Require embeddings to specify an origin coordinate on creation
2024-07-27 14:12:28 -07:00
Jozufozu
b8f6bf841d Biased towards artistic control
- Extend InstancerProvider to allow visuals to bias the render order of
  their instancers
- Keep the old InstancerProvider#instancer method with a bias of 0
- Add an explanation of render order in InstancerProvider
2024-07-26 16:32:05 -07:00
PepperCode1
eb2ba12a98
Formalize most public API (#253)
* Start on general API formalization

* More API improvements

- Add Engine#onLightUpdate; remove LightUpdateHolder and backend/ClientChunkCacheMixin
- Add Effect#level
- Add VisualizationHelper#queueAdd and #queueRemove for Effects
- Fix PartialModel not assigning bakedModel field when populating on init
- Fix PartialModel.ALL using weak keys instead of weak values
- Make Simple*Visualizer and corresponding inner Builder classes final
- Restore FlatLit#light overload that accepts block and sky light values separately
- Add AbstractBlockEntityVisual#relight overloads that accept Iterator and Iterable
- Reorganize classes in impl.vizualization

* TaskExecutor simplification

- Move TaskExecutor#sync* methods to TaskExecutorImpl
- Move Flag and RaisePlan to impl
- Remove TaskExecutor#scheduleForMainThread and #isMainThread methods
- Remove SyncedPlan
- Add Engine#setupRender
- Remove TaskExecutor parameters from Engine#render* methods
- Convert Engine$CrumblingBlock into an interface
- Unmark RenderContext as NonExtendable to allow fulfilling the purpose described in the doc of VisualizationManager#renderDispatcher

* Remove registry freeze callbacks

- Lazily initialize MaterialShaderIndices
- Rename MaterialShaders#*Shader to #*Source
- Move BackendImplemented to api.backend package
2024-07-26 14:21:35 -06:00