Running list of ideas and notes

Fullstack categorical semantics

db / api / frontend

what are the morphisms?

let’s say you store user uploaded notes

instead of modeling domains model it as categories

functor somewhere

predicate isDeleted - same on

records are infinite tuples with finite non-zeros - keys are elements of a free monoid

projections remove fields

what are the types

what is the ambient category


a bit more formalization and cleanup

fix the type system (typescript)

current solutions - well we have branded types, can they enforce the same constraints? what are the constraints?

start with user record

some examples

in the db we have user (id, name, email, subscription, password_hash, created_at)

let’s say subscription is ‘free’ | ‘standard’ | ‘premium’

someone wants to get currently logged in user

in the api - (id, name, email, subscription)

and in the frontend we might only display and use (name, email, subscription), no id at all

this is a simple example where a single record starts in the db and gets projected down all the way

for branded type, we’d represent DbUser, ApiUser, and User

since it’s a projection and typescript has structural subtyping, any function that works for some record type will also work for any projection preimage

let’s say we have

isPayingUser(user: User) { return user.subscription !== ‘free’ }

this equally works for DbUser and ApiUser

let’s go back to the actual problem we’re solving here

mixed usage? is that even a problem?

when is a function such as isPayingUser safe to use across the boundaries? One trivial answer is yes if the types work out

so what’s the actual problem here though?

Quantization

take a (feedforward) network

change the field \mathbb{R} to (ring) \mathbb{Z}

the weights are scalars (now \mathbb{Z}-module)

isomorphic to an abelian group?

anything interesting here?

each weight in the next layer is a linear combo

non linearity is in the way but clamps to \mathbb{Z}

other interesting (commutative) rings for an R-module?

Algebraic crap about neural networks

take a (feedforward) network, remove non linearity

linear maps

take (co)chain complex? can you? is there a point?

why?

take (projective) resolution? are the maps somehow related to the original maps?

how what does non-linearity do functorially? is there a generalization?

Spherical stable diffusion processes

what happens when the underlying manifold of a diffusion process changes

for example, can we perform diffusion process on a sphere

is this a simple reparametrization? any interesting invariants preserved?

wild curvature appears, linearity is local not global

does rotation invariance help with anything?

Commutative image gen model edits

assume you have two models with same architecture (e.g. nano-banana and nano-banana/edit - are they the same architecture btw just different inputs?)

fix a seed

let’s say you have a prompt

“a young woman wearing black shirt”

outputs an image

feed the image into edit model - and apply a prompt rewrite rule

“change black shirt to blue shirt”

under what condition does ei(g(prompt), edit) ~= g(ep(prompt, edit))

g - generate ei - edit image ep - edit prompt

this looks almost like a naturality condition, preserving edits across image and text embedding spaces

is this even well formed?

what is equality here?

is there a notion of homotopy between two images (kinda equality?)

Avalanche edits

idea: what intrinsic properties of the initial prompt / image / model cause small changes in prompts result in large changes in output images given the same seed?

trivial example: add " " empty space another example: replace with synonym another example: replace with something similar using similar embedding

how different are the outputs? how to measure it? similarity between image embedding spaces? is this related to expansive functions?

Continuous image model transformations

enforce rewrite rules on prompt, treating them as some meta-languages

for example

we’re transforming

“a young woman wearing a hat”

into

“an old cat wearing a tuxedo”

under the small-step operational semantics

replace_word(old, new)

and outputting an image in between

then, we consider the space of all paths under these operations

and the curves they make in the underlying manifold

is there something interesting there?

is there a notion of homotopy between two paths? what’s the point? without constraints everything is deformable into everything (paths have the same start and end prompt and therefore the same images). are there any semantic constraints we can introduce to tighten the definition of “deformed into another”. for example, maybe intermediate step preserves number of objects in the scene, maybe for some layout interpreter (e.g. 1 woman, 1 hat) it becomes invariant…also kinda sounds like bullshit

Image generation grid + upscale cost optimization?

is it possible (or even useful) to optimize stable diffusion by downscaling the image first, then using the big model on downscaled, then upscale?

one dumb way is to having a 0.5x scale 720p -> 360p) and then upscale it with a different model? would this be faster?

or take one 1080p image and put 4 960x540 images into it with one prompt, then upscale them individually - this could be fun and save 4x (or 3x if compared to 0.5K version since it’s 0.75 of the cost) the cost on 4 images

the cost would be only $0.08 per image (nano-banana-2) + 1 megapixel worth of upscaling for $0.001 with seedvr model

might wanna revisit this on stable diffusion dumb version in C

I used this prompt first

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Black lab

notice no different in lab color

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Still Black lab

underspecifying color does not work either

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

An action shot of a <RANDOM_COLOR> lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Other labs

<RANDOM_COLOR> tag seems to work, I wonder if uppercase / angle brackets are significant

NOTE: what other tags work? can we put an animal here? how specified would it have to be? e.g. <FLYING_ANIMAL> or , or <RANDOM_MAMMAL>.

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

[(0, 0) - SEED 283434895734859]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(0, 1) - SEED 34234234578345]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 0) - SEED 234782758943735]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 1) - SEED 34583957983479835]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

no tags, same prompt, injected seed, prompt repeated, underspecified lab color

Seed lab

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

[(0, 0) - SEED 283434895734859]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(0, 1) - SEED 34234234578345]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 0) - SEED 234782758943735]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 1) - SEED 34583957983479835]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Seed black lab

for comparison, regular prompt with changed seed without injecting it into the prompt

An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

seed 8530897

seed 8530897 black lab

seed 1803870

seed 1803870 black lab

more underspecified elements affected by the seed (e.g. background props)

interesting we could have llm analyze underspecified props to create cheap 4x images maybe?