Running list of ideas and notes
Fullstack categorical semantics
db / api / frontend
what are the morphisms?
let’s say you store user uploaded notes
instead of modeling domains model it as categories
functor somewhere
predicate isDeleted - same on
records are infinite tuples with finite non-zeros - keys are elements of a free monoid
projections remove fields
what are the types
what is the ambient category
a bit more formalization and cleanup
fix the type system (typescript)
current solutions - well we have branded types, can they enforce the same constraints? what are the constraints?
start with user record
some examples
in the db we have user (id, name, email, subscription, password_hash, created_at)
let’s say subscription is ‘free’ | ‘standard’ | ‘premium’
someone wants to get currently logged in user
in the api - (id, name, email, subscription)
and in the frontend we might only display and use (name, email, subscription), no id at all
this is a simple example where a single record starts in the db and gets projected down all the way
for branded type, we’d represent DbUser, ApiUser, and User
since it’s a projection and typescript has structural subtyping, any function that works for some record type will also work for any projection preimage
let’s say we have
isPayingUser(user: User) { return user.subscription !== ‘free’ }
this equally works for DbUser and ApiUser
let’s go back to the actual problem we’re solving here
mixed usage? is that even a problem?
when is a function such as isPayingUser safe to use across the boundaries? One trivial answer is yes if the types work out
so what’s the actual problem here though?
Quantization
take a (feedforward) network
change the field \mathbb{R} to (ring) \mathbb{Z}
the weights are scalars (now \mathbb{Z}-module)
isomorphic to an abelian group?
anything interesting here?
each weight in the next layer is a linear combo
non linearity is in the way but clamps to \mathbb{Z}
other interesting (commutative) rings for an R-module?
Algebraic crap about neural networks
take a (feedforward) network, remove non linearity
linear maps
take (co)chain complex? can you? is there a point?
why?
take (projective) resolution? are the maps somehow related to the original maps?
how what does non-linearity do functorially? is there a generalization?
Spherical stable diffusion processes
what happens when the underlying manifold of a diffusion process changes
for example, can we perform diffusion process on a sphere
is this a simple reparametrization? any interesting invariants preserved?
wild curvature appears, linearity is local not global
does rotation invariance help with anything?
Commutative image gen model edits
assume you have two models with same architecture (e.g. nano-banana and nano-banana/edit - are they the same architecture btw just different inputs?)
fix a seed
let’s say you have a prompt
“a young woman wearing black shirt”
outputs an image
feed the image into edit model - and apply a prompt rewrite rule
“change black shirt to blue shirt”
under what condition does ei(g(prompt), edit) ~= g(ep(prompt, edit))
g - generate ei - edit image ep - edit prompt
this looks almost like a naturality condition, preserving edits across image and text embedding spaces
is this even well formed?
what is equality here?
is there a notion of homotopy between two images (kinda equality?)
Avalanche edits
idea: what intrinsic properties of the initial prompt / image / model cause small changes in prompts result in large changes in output images given the same seed?
trivial example: add " " empty space another example: replace with synonym another example: replace with something similar using similar embedding
how different are the outputs? how to measure it? similarity between image embedding spaces? is this related to expansive functions?
Continuous image model transformations
enforce rewrite rules on prompt, treating them as some meta-languages
for example
we’re transforming
“a young woman wearing a hat”
into
“an old cat wearing a tuxedo”
under the small-step operational semantics
replace_word(old, new)
and outputting an image in between
then, we consider the space of all paths under these operations
and the curves they make in the underlying manifold
is there something interesting there?
is there a notion of homotopy between two paths? what’s the point? without constraints everything is deformable into everything (paths have the same start and end prompt and therefore the same images). are there any semantic constraints we can introduce to tighten the definition of “deformed into another”. for example, maybe intermediate step preserves number of objects in the scene, maybe for some layout interpreter (e.g. 1 woman, 1 hat) it becomes invariant…also kinda sounds like bullshit
Image generation grid + upscale cost optimization?
is it possible (or even useful) to optimize stable diffusion by downscaling the image first, then using the big model on downscaled, then upscale?
one dumb way is to having a 0.5x scale 720p -> 360p) and then upscale it with a different model? would this be faster?
or take one 1080p image and put 4 960x540 images into it with one prompt, then upscale them individually - this could be fun and save 4x (or 3x if compared to 0.5K version since it’s 0.75 of the cost) the cost on 4 images
the cost would be only $0.08 per image (nano-banana-2) + 1 megapixel worth of upscaling for $0.001 with seedvr model
might wanna revisit this on stable diffusion dumb version in C
I used this prompt first
2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

notice no different in lab color
2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

underspecifying color does not work either
2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:
An action shot of a <RANDOM_COLOR> lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

<RANDOM_COLOR> tag seems to work, I wonder if uppercase / angle brackets are significant
NOTE: what other tags work? can we put an animal here? how specified would it have to be? e.g. <FLYING_ANIMAL> or , or <RANDOM_MAMMAL>.
2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:
[(0, 0) - SEED 283434895734859]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
[(0, 1) - SEED 34234234578345]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
[(1, 0) - SEED 234782758943735]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
[(1, 1) - SEED 34583957983479835]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
no tags, same prompt, injected seed, prompt repeated, underspecified lab color

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:
[(0, 0) - SEED 283434895734859]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
[(0, 1) - SEED 34234234578345]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
[(1, 0) - SEED 234782758943735]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
[(1, 1) - SEED 34583957983479835]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

for comparison, regular prompt with changed seed without injecting it into the prompt
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.
seed 8530897

seed 1803870

more underspecified elements affected by the seed (e.g. background props)
interesting we could have llm analyze underspecified props to create cheap 4x images maybe?