Running list of ideas and notes

Fullstack categorical semantics

db / api / frontend

what are the morphisms?

let’s say you store user uploaded notes

instead of modeling domains model it as categories

functor somewhere

predicate isDeleted - same on

records are infinite tuples with finite non-zeros - keys are elements of a free monoid

projections remove fields

what are the types

what is the ambient category

a bit more formalization and cleanup

fix the type system (typescript)

current solutions - well we have branded types, can they enforce the same constraints? what are the constraints?

start with user record

some examples

in the db we have user (id, name, email, subscription, password_hash, created_at)

let’s say subscription is ‘free’ | ‘standard’ | ‘premium’

someone wants to get currently logged in user

in the api - (id, name, email, subscription)

and in the frontend we might only display and use (name, email, subscription), no id at all

this is a simple example where a single record starts in the db and gets projected down all the way

for branded type, we’d represent DbUser, ApiUser, and User

since it’s a projection and typescript has structural subtyping, any function that works for some record type will also work for any projection preimage

let’s say we have

isPayingUser(user: User) { return user.subscription !== ‘free’ }

this equally works for DbUser and ApiUser

let’s go back to the actual problem we’re solving here

mixed usage? is that even a problem?

when is a function such as isPayingUser safe to use across the boundaries? One trivial answer is yes if the types work out

so what’s the actual problem here though?

Quantization

take a (feedforward) network

change the field \mathbb{R} to (ring) \mathbb{Z}

the weights are scalars (now \mathbb{Z}-module)

isomorphic to an abelian group?

anything interesting here?

each weight in the next layer is a linear combo

non linearity is in the way but clamps to \mathbb{Z}

other interesting (commutative) rings for an R-module?

let’s take int8 quantization, the values are -128..127

technically we can look at Z/256Z or F_256 - the first one is a ring so we can only conclude module structure, the second one is a finite field, but + and * do not behave the same way they behave on int8

Algebraic crap about neural networks

take a (feedforward) network, remove non linearity

linear maps

take (co)chain complex? can you? is there a point?

why?

take (projective) resolution? are the maps somehow related to the original maps?

how what does non-linearity do functorially? is there a generalization?

Spherical stable diffusion processes

what happens when the underlying manifold of a diffusion process changes

for example, can we perform diffusion process on a sphere

is this a simple reparametrization? any interesting invariants preserved?

wild curvature appears, linearity is local not global

does rotation invariance help with anything?

Commutative image gen model edits

assume you have two models with same architecture (e.g. nano-banana and nano-banana/edit - are they the same architecture btw just different inputs?)

fix a seed

let’s say you have a prompt

“a young woman wearing black shirt”

outputs an image

feed the image into edit model - and apply a prompt rewrite rule

“change black shirt to blue shirt”

under what condition does ei(g(prompt), edit) ~= g(ep(prompt, edit))

g - generate ei - edit image ep - edit prompt

this looks almost like a naturality condition, preserving edits across image and text embedding spaces

is this even well formed?

what is equality here?

is there a notion of homotopy between two images (kinda equality?)

Avalanche edits

idea: what intrinsic properties of the initial prompt / image / model cause small changes in prompts result in large changes in output images given the same seed?

trivial example: add " " empty space another example: replace with synonym another example: replace with something similar using similar embedding

how different are the outputs? how to measure it? similarity between image embedding spaces? is this related to expansive functions?

Continuous image model transformations

enforce rewrite rules on prompt, treating them as some meta-languages

for example

we’re transforming

“a young woman wearing a hat”

into

“an old cat wearing a tuxedo”

under the small-step operational semantics

replace_word(old, new)

and outputting an image in between

then, we consider the space of all paths under these operations

and the curves they make in the underlying manifold

is there something interesting there?

is there a notion of homotopy between two paths? what’s the point? without constraints everything is deformable into everything (paths have the same start and end prompt and therefore the same images). are there any semantic constraints we can introduce to tighten the definition of “deformed into another”. for example, maybe intermediate step preserves number of objects in the scene, maybe for some layout interpreter (e.g. 1 woman, 1 hat) it becomes invariant…also kinda sounds like bullshit

Image generation grid + upscale cost optimization?

is it possible (or even useful) to optimize stable diffusion by downscaling the image first, then using the big model on downscaled, then upscale?

one dumb way is to having a 0.5x scale 720p -> 360p) and then upscale it with a different model? would this be faster?

or take one 1080p image and put 4 960x540 images into it with one prompt, then upscale them individually - this could be fun and save 4x (or 3x if compared to 0.5K version since it’s 0.75 of the cost) the cost on 4 images

the cost would be only $0.08 per image (nano-banana-2) + 1 megapixel worth of upscaling for $0.001 with seedvr model

might wanna revisit this on stable diffusion dumb version in C

I used this prompt first

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Black lab

notice no different in lab color

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Still Black lab

underspecifying color does not work either

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

An action shot of a <RANDOM_COLOR> lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Other labs

<RANDOM_COLOR> tag seems to work, I wonder if uppercase / angle brackets are significant

NOTE: what other tags work? can we put an animal here? how specified would it have to be? e.g. <FLYING_ANIMAL> or , or <RANDOM_MAMMAL>.

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

[(0, 0) - SEED 283434895734859]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(0, 1) - SEED 34234234578345]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 0) - SEED 234782758943735]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 1) - SEED 34583957983479835]
An action shot of a lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

no tags, same prompt, injected seed, prompt repeated, underspecified lab color

Seed lab

2x2 GRID OF IMAGES WITH EACH IMAGE USING THE FOLLOWING DESCRIPTION:

[(0, 0) - SEED 283434895734859]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(0, 1) - SEED 34234234578345]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 0) - SEED 234782758943735]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

[(1, 1) - SEED 34583957983479835]
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

Seed black lab

for comparison, regular prompt with changed seed without injecting it into the prompt

An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.

seed 8530897

seed 8530897 black lab

seed 1803870

seed 1803870 black lab

more underspecified elements affected by the seed (e.g. background props)

interesting we could have llm analyze underspecified props to create cheap 4x images maybe?

Clifford algebras and CliffordNet

testing math markup

Inline: $K \times K \rightarrow S$

Algebraic ideas around neural networks

take a standard fully connected layer f(Wx + b)

let’s start decomposing it algebraically

bias has addition, which suggests abelian group like structure, let’s remove it
f is a function (non-linearity) - we don’t know what that is
Wx is a matrix multiplication, but it could also be any reasonable “action” on some algebraic object

next, what is backprop before talking about backprop we need to talk about derivative or, in algebraic settings, a derivation Dfg = fDg + gDf derivation assumes an underlying algebra (otherwise multiplication makes no sense) is there some kinda categorical structure that gives us differentiation? cartesian differential categories come to mind needs more thought does dualizing that notion result in backprop? what does it give us?

only when we get to optimizers we actually need convergence, which means the notion of limit, which means some kinda topology / metric

maybe a better question would be - what is the category where we can do (evaluate and train) neural networks? similar to the category where we can do lambda calculus

Autograd and Differential Geometry

automatic differntiation primer
vector jacobian product explained
passing gradient as covector
basic diffgeom, (co)tangent bundles, pullback maps
examples of autograd on other manifolds (maybe pick S^1)

Notes

this is a differential geometry approach to demystifying the following statement from the pytorch docs

Generally speaking, torch.autograd is an engine for computing vector-Jacobian product. That is, given any vector v⃗v , compute the product JT⋅v⃗JT⋅

while everything blends in $\mathbb{R^n}$ , a more precise formulation of what’s happening is better understood through pullbacks

confusion

Let $f: \mathbb{R^n} \rightarrow \mathbb{R^m}$ the jacobian $J_f$ of f is $\in \mathcal{M}^{m \times n}\mathbb(R)$ this jacobian acts on vectors $v \in \mathbb{R^m}$ on the left $J_f v$ as a map between the tangent spaces $Df_x : T_x\mathbb{R}^n \rightarrow T_{f(x)}\mathbb{R}^m$

and the pullback map is $Df_x^{*} : T_{f(x)}^{*}\mathbb{R}^n \rightarrow T_{x}^{*}\mathbb{R}^m$

If v⃗v happens to be the gradient of a scalar function l=g(y⃗)l=g(y):

actually says that v is a covector, not a vector and through identification of covectors with $\mathbb{R}^n$ we can now compute $J_f^{T}w$

so the gradient (a 1-covector) is moving through the pullback map in order to define the image of the gradient, we define it’s values on vectors of $v$ so for any covector $\omega$ $(Df_x^{*}(\omega))(v) = w(Df_x(v))$

Running list of ideas and notes#

Fullstack categorical semantics#

Quantization#

Algebraic crap about neural networks#

Spherical stable diffusion processes#

Commutative image gen model edits#

Avalanche edits#

Continuous image model transformations#

Image generation grid + upscale cost optimization?#

Clifford algebras and CliffordNet#

Algebraic ideas around neural networks#

Autograd and Differential Geometry#

Notes#