This is a log of installing the software needed to run Mixtral and Diffusers. I was able to have Diffusers and Mixtral running locally within 2 hours. This procedure is editted to remove my missteps – I installed things to the wrong Python environment a few times.
Install the first applications:
Rebind Caps Lock to Control.
Install XCode through the App store and XCode tools via the system prompt that appears when loading iTerm2 for the first time. Accept the license with sudo xcodebuild -license
.
Next: I asked the internet and all my friends if I should go the conda route or Homebrew route. I went the homebrew route.
Install Homebrew from http://brew.sh.
Type which pip3
a bunch of times. It’s easy to mix up Python environments. I make sure to install everything to the Homebrew Python install.
Run brew install python3
.
At this point, the system path will still point to the old python3. Type which python3
. I installed packages to the wrong python dir a few times. It’s easiest to just close Iterm2 and reopen it at this point.
Type which pip3
and which python3
a bunch of times after starting a new terminal to make sure it’s the right one.
Install python and brew packages:
which pip3 # Should say /opt/homebrew/bin/pip3
which pip # Should say /opt/homebrew/bin/pip
pip3 install torch
python3 -c "import torch; print(torch.__path__)" # Should print ['/opt/homebrew/lib/python3.11/site-packages/torch']
pip3 install numpy matplotlib
pip3 install jupyter
pip3 install transformers gradio scipy ftfy datasets tqdm accelerate
pip3 install diffusers
brew install cmake
brew install git-lfs
For LaTeX:
brew install --cask mactex
It took a few minutes to modify a preexisting colab notebook.
'cuda'
to 'mps'
everywherewith autocast('mps')
does not work yet. (https://github.com/pytorch/pytorch/issues/88415)torch_dtype=torch.float16
does not work.The final snippet to generate the pipe is:
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
revision="fp16",
# torch_dtype=torch.float16, # Generates black images if you leave it in.
use_auth_token=True,
).to("mps")
and then just apply 1 & 2 everywhere.
Ollama is awesome. Just install it from https://github.com/ollama/ollama, and then ollama run mixtral
just works!
The Huggingface repository using git-lfs for Mistral weights is 55GiB and Mixtral is >100GiB, since there are multiple checkpoints in full precision. Ollama will only download one copy of the quantized weights, so it takes up less disk space.
Here is what model weight storage looks like now:
% ollama list
NAME ID SIZE MODIFIED
codellama:34b 685be00e1532 19 GB 2 weeks ago
deepseek-coder:1.3b-base-q4_1 75c6c24f8b9a 856 MB 2 weeks ago
deepseek-coder:33b-base-q4_K_M a205c1c80cf6 19 GB 2 weeks ago
llama2:latest 78e26419b446 3.8 GB 2 weeks ago
mistral:latest 61e88e884507 4.1 GB 2 weeks ago
mixtral:latest 7708c059a8bb 26 GB 2 weeks ago
% du -sh .cache/huggingface/hub/*
2.6G .cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4
139M .cache/huggingface/hub/models--facebook--mms-tts-eng
2.6G .cache/huggingface/hub/models--runwayml--stable-diffusion-inpainting
2.6G .cache/huggingface/hub/models--timbrooks--instruct-pix2pix
4.0K .cache/huggingface/hub/version.txt
4.0K .cache/huggingface/hub/version_diffusers_cache.txt
So, the 2TiB SSD option that I chose is large enough.
]]>where $D$ is the number of layers, and $f^l$, $W^l$, and $b^l$ are activation functions, weight matrices, and bias vectors for every layer $l$. These layered operations produce a set of intermediate results known as the hidden layers. We can think of the neural network as having intermediate steps,
\[x_i\rightarrow h^1_i \rightarrow h^2_i\rightarrow...h^D_i\rightarrow y_i\]where each hidden layer is the result of the operation $h^{d+1}=f^d(W^dh^d+b^d)$.
I want to make this continuous. How do we describe an arbitrary functional
\[y(\xi) = \mathcal{F}x(\xi)\]where the input is a function with a domain, and the output is also function? I’ll do this by making analogies from the discrete operations in a multilayer perceptron to continuous operations.
What we will end up with is an integro-differential equation that describes a “neural continuum.” The linear operations along the width of each layer will be replaced by an integral along a signal space, and the nested applications will be replaced by integrating a differential equation in a depth direction.
Something that bothers me about deep learning today is the lack of convergence behavior. If you throw more parameters at a problem, there is no guarantee that it will do better than the smaller model. Non-convexity and non-uniqueness of problems will certainly make formal convergence proofs a challenge. Maybe we can change the problem slightly to give better behavior.
How do we compare two incrementally different model architectures? If we have one network that was trained, what’s an equivalent network with one more parameter in width or depth? I think this is necessary to show a smooth path. Instead of looking for a specific network, let’s posit that we’re looking for another more abstract object with various representations on our computer. If we can describe a continuous analogue, we can then define a way to slowly increase or decrease the number of parameters describing the network.
The basic element of a neural network calculation is the array of activations. Let $N_w$ denote the width of this layer $d$. At each layer, there is an $N_w$ array of hidden values
\[\mathbf{h}^d=\{h_1,h_2,...h_N\}\]In convolutional networks, sometimes these arrays are 2D, 3D, or higher carrying some sort of spatio-temporal-channel meaning. Then, $\mathbf{h}$ has multiple indices in a grid layout, which can be indexed as $h^d_{ijkc}$ for a four dimensional example with $N_x\times N_y \times N_z \times N_c$ dimension.
Consider now a one dimensional function that maps from some nominal domain onto the reals
\[h(\xi) : [-1,1]\rightarrow\mathbb{R}\]we can have higher dimensional input fields with higher dimensional outputs, e.g.,
\[h(\xi,\eta,\omega):[-1,1]\times[-1,1]\times[-1,1]\rightarrow \mathbb{R}^c\]for a three dimensional field with $c$ channels at each point. (The domain $[-1,1]$ was an arbitrary choice, but if we picked any other domain, we’d probably end up mapping back to $[-1,1]$ for other steps later. E.g., common basis sets and Gaussian integration weights are defined on this domain.)
Let us now choose to interpret our array of hidden activations as coefficients to a
These numbers don’t necessarily have any order applied in their meaning. For example, in NLP applications the indices refer to entries in the lexicon, whose order is arbitrary. But, for a lot of applications there is a spatiotemporal interpretation with images or audio signals, for example. I don’t think this line of reasoning is restricted to such applications.
The linear step of the perceptron is:
\[\mathbf{y}=\mathbf{W}\mathbf{x}+\mathbf{b}\]The operation is expressed in summation notation explicitly by
\[y_i = \sum_{j=1}^{N_w} W_{ij} x_j + b_i\]A 2D array is just one possible representation of a linear operator. As we did above, we can represent the weight matrix as a 2D function,
\[W(\xi,\eta) = [-1,1]\times[-1,1] \rightarrow \mathbb{R}\]The limit of summation is integration. In conjunction with adding a 1D bias function, the analogy of the linear transform is the following integral:
\[y(\xi)=\int_{-1}^1W^d(\xi,\eta)x(\eta)\mathrm{d}\eta+b^d(\xi)\]We can illustrate the analogy as so: In the continuous analogue, every weight matrix is replaced by a 2D function that convolves in the input function into an output function. This is also a linear operator. We chose both domains to be the same for the next step.
The secret to neural networks is the composition of functions,
\[y = (f^{N_d} \circ f^{N_d-1}... \circ f^2...f^1)(x),\](where we’ve tucked the $W+b$ operations into $f$ for this section.)
Let’s define the analogy to a summation operator for composition,
\[\mathop{\Large\bigcirc}_{d=1}^{N}f^{d}=f^{N}\circ f^{N-1}...\circ f^{2}\circ f^{1}\]where $\bigcirc$ means repeated composition over a list of unknowns. (In Flux.jl, the Chain
function implements $\bigcirc$.) We can then write our definition of a basic multilayer perceptron as
where I wrote an anonymous function for the linear operator.
Just like we did with summation, I want to turn composition into a continuous operation. For that, we need to define a gradual function application.
The input and output from each of these discrete steps forms a list of functions,
\[x(\xi),\, h^1(\xi),\,h^2(\xi),\,...h^{N_d}(\xi),\,y(\xi)\]For the individual signal slices, we turned an array with indices $i\in[1…N_d]$ into a function $\xi^l\in[-1,1]$. Let us decide that the domains and ranges for every function in this list are the same. Let’s now make a new transformation from this list of functions indexed by $d\in[1…N^d]$ to a new depth signal direction $\delta\in[0,1]$. Now we have a 2D function
\[h(\xi,\delta):\underbrace{[-1,1]}_{\text{width}}\times\underbrace{[0,1]}_{\text{depth}}\rightarrow\mathbb{R}\]where we defined $h(\xi,0)=x(\xi)$ and $h(\xi,1)=y(\xi)$. Suppose the original layers of the discrete network be spaced out by a uniform $\Delta=1/(N_d+1)$ in this depth direction in our continuous analogue. In the discrete form of the network, the way we get from one layer indexed by $i$ to next indexed by $i+1$ is
\[h(\delta=\Delta \times (i+1))=f^i(h(\delta=\Delta\times i))\]Between two layers, starting at $\delta=\Delta i$, the difference in $h$ is equal to an integral between two points,
\[h(\delta+\Delta)=h(\delta)+\int_\delta^{\delta+\Delta} \frac{\mathrm{d}h}{\mathrm{d}\delta} \mathrm{d} \delta' = f(h(\delta))\]where $t$ is a dummy variable of integration. Doing some rewriting,
\[\int_\delta^{\delta+\Delta} \frac{\mathrm{d}h}{\mathrm{d}\delta} \mathrm{d} \delta' = f(h(\delta))-h(\delta)\]In the original discrete network, each layer is a discrete operation forwards. That begs us to use a Forward Euler approximation to this integral,
\[\Delta \left. \frac{\mathrm{d}h}{\mathrm{d}\delta} \right|_\delta = f(h(\delta))-h(\delta)\]or,
\[\frac{\mathrm{d}h}{\mathrm{d}\delta} = \frac{f(h(\delta))-h(\delta)}{\Delta}\]which gives us a differential equation for $h$ along the depth direction $\delta$.We can solve it with any integrator, not just Forward Euler. Forward Euler is just how we make the analogy to the feed forward discrete network. $\Delta$ is a constant that is smaller than the total length of the domain in $\Delta$ and scales the rate of the discrete activation functions $f$.
It is not necessary to use the original activation function $f$. We can define an activation rate $\gamma$ which corresponds to a particular discrete function with the above equation, or select any arbitrary function. I initially hypothesized a function with a continuous derivative (softplus vs. rectifier) would be best; I do not think that is the case anymore but have not tested this.
Piecing together the above continuous analogues from the individual operations in a discrete neural network, we derive the following integrodifferential equation
\[\frac{\partial}{\partial \delta} h(\xi,\delta)= \gamma\left(\int_{-1}^1 W(\xi,\xi',\delta)h(\xi',\delta)\mathrm{d}\xi'+b(\xi,\delta) \right)\]where $\gamma(x)=(f(x)-x)/\Delta$, subject to the initial condition
\[h(\xi,0)=x(0).\]After solving this equation, the output of the network is:
\[y(\xi)=h(\xi,1)\]The network $\mathcal{F}$ is defined by the selection of nonlinearity $\gamma$, 3D weight field $W(\xi,\eta,\delta)$, and 2D bias field $b(\xi,\delta$). There is no discrete choice of layer widths or network depth.
We need to perform two discretizations to be able to solve this system: 1) signal-depth-space discretization of the fields, and 2) discretization of the integration across $\delta$ to evolve as a discrete loop.
The first of these harkens back to the beginning of this discussion. We need to represent a continuous function as a set of numbers in memory on our computer. We can choose to represent functions as linear combinations of functions from discrete spaces, where we need 3D functions for $W$ to produce a discrete $\hat{W}$,
\[\hat{W}(\xi,\eta,\delta)=\sum_i w_i \phi_i(\xi,\eta,\delta)\]and 3D functions for $b$ to produce a discrete $\hat{b}$,
\[\hat{b}(\xi,\delta)=\sum_i b_i\psi(\beta,\delta)\]We can recover the original specification by having 3-way piecewise constant discretizations in $\xi,\eta$, and $\delta$. We could slice in $\delta$ first, to define our layers, then pick different piecewise supports for each of these $\delta$s to have weight matrices with different dimensions. (The original discrete neural networks are just nonlinear versions of this—but let’s not make this manuscript recursive!)
My intuition suggests using spectral shape functions along $\xi$ and $\xi’$, and using compact shape functions along $\delta$. I suspect orthogonality along $\xi$ might be problematic.
The integration along the depth can be handled by any ordinary differential equation integrator. We used Forward Euler to make the correspondence, but we could use higher order or implicit solvers. It will make a distinction when doing the backwards propagation of the output of the network. The full domain support of the integral convolution ($\int W h \mathrm{d}\xi’$) could make the integration in depth-time tricky to do efficiently.
We can define a loss function between the label signal and solution of the network by integrating over the signal,
\[L(y,y^*)=\int_{-1}^{1} \left(y(\xi)-y^*(\xi)\right)^2\mathrm{d}\xi\]The optimization problem involves searching for functions $W(\xi,\eta,\delta)$ and $b(\xi,\delta)$ that minimize the integral loss,
\[\min_{W,b} \sum_{i=1}^{N_{train}} L(y^i,\mathcal{F}x^i)\]After discretization, the approximate optimization problem is:
\[\min_{\hat{W_i},\hat{b_i}} \sum_{i=1}^{N_{train}} L(y^i,\hat{\mathcal{F}}x^i)\]This may seem more complicated, but inverting on the coefficients to fields in partial differential equations is a well studied field. This problem looks a lot like the full waveform inversion problem in geological prospecting, wherein we’re solving for material properties such as elastic bulk modulus $K(x,y,z)$ and density $\rho(x,y,z)$ subject to acoustic datasets paired with sources, $hammer(t)$ and $f(x_{microphone},t)$.
Note that now we have posited a true problem, and an approximate problem. We can now argue that each time we choose a new discretization (cf. network architecture), we are looking for a better (or worse) facsimile on our computer, not a completely new entity.
The discrete gives us a way to transform between one “width-depth” discretization to another by projecting between the four meshes ($\hat{W},\hat{b}\rightarrow \hat{W}’,\hat{b}’$). Suppose we had trained one network $\mathcal{F}$, and decide we want more discretization in the depth or width. We can create a new mesh that’s finer, and then project to the new unknowns,
\[\min_{w'_i} \int_{0}^1\int_{-1}^1\int_{-1}^1 \left( \sum_{i=1}^N\left( w_i\phi_i\right)-\sum_{i=1}^{N'}\left( w'_i\phi'_i\right) \right)^2\mathrm{d}\xi'\mathrm{d}\eta\mathrm{d}\delta\]This yields a linear projection operation which is common in finite element and spectral methods.
The ultimate goal is to obtain a procedure where we can:
I hypothesize that this would give us more confidence in our models, and a faster training procedure by exploiting multigrid methods.
I never implemented this idea because it seemed like a lot of effort. I originally had the variational calculus interpretation of models as a function we’re trying to approximate. That perspective makes us look for better abstract representations of functions and function spaces to figure out more clever ways to Lately, I’ve been leaning towards the discrete program interpretations of models, wherein we’re directly synthesizing and learning computer programs with floating point variables and a listing of instructions fmul
s, fadd
s, goto
s, etc. A few new developments made me flip back to these pages in my notebook and start thinking about mathematically solving for continuous functions again.
The Neural Ordinary Differential Equation at NeurIPS uses this kind of idea in the depth. The Julia implementation using Flux.jl shows that it could possibly be implemented very easily. The work of Chang et. al used a similar ODE interpretation of residual networks, and pushed it further to develop a multigrid-in-time technique for accelerated training. (I really like this; it uses the concept of smooth refinement of an approximation, and the multigrid method is great algorithm.) They present a similar equation to $\mathrm{d}h/\mathrm{d}\delta=(f(h)-h)/\Delta$ specific for residual networks where $f=G(h)+h$. The formulation I developed here includes a simultaneous continuum-ization of the width of the network that can applied to more than residual networks.
Learning more about full waveform inversion (I credit talking to Russell J. Hewett and following his course for PySit) made me realize that the optimizing for fields like $W$ or $b$ is well studied and not completely intractable.
Today (when I wrote up most of this) I attended a talk on group equivarent convolutional networks by Taco Cohen which was deriving the discrete model using differential geometry from the perspective of continuous functions obeying symmetries. This approach is actually phrasing the hidden layers as continuous fields and designing layers that obey properties on the fields, before discretizing the operators.
Hopefully this presentation helps interpret new developments in continuous models. I would want to find a problem with 1D functions as inputs to actually implement this exact idea—perhaps audio signals.
Chen, Tian Qi, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. “Neural Ordinary Differential Equations.” CoRR abs/1806.07366 (2018). http://arxiv.org/abs/1806.07366
I want to store and organize arbitrary code.
We’re going to use Mongoc.jl to drive the database, but Mike Inne’s BSON.jl which is more robust and all-julia. I forked BSON.jl to do a few modifications which aren’t needed for the minimal example code at the bottom:
] add Mongoc
] add BSON
] add /Users/afq/Documents/Dropbox/MyLibraries/BSON.jl#master
In [5]:
using Mongoc
using BSON # A different BSON implementation
You’ll have to set up your own server to follow along with the write queries. The password is left out because this is the read/write account. The queries at the bottom of this post connect to the database with a read-only account.
In [6]:
client2=Mongoc.Client("mongodb+srv://train:PASSWORD@codedump-pmluz.azure.mongodb.net/test?retryWrites=true")
Client(URI("mongodb+srv://train:train@codedump-pmluz.azure.mongodb.net/test?retryWrites=true"))
Let’s do a couple of trivial operations to make sure the database driver works:
In [7]:
Mongoc.ping(client2)
BSON("{ "ok" : 1 }")
In [4]:
document = Mongoc.BSON("a" => 1, "b" => "field_b",
"c" => [1, 2, 3])
push!(client2["mydb"]["collection"], document)
BSON("{ "a" : 1, "b" : "field_b", "c" : [ 1, 2, 3 ] }")
We store a function-symbol or an anonymous function.
In [26]:
f(x) = 2*x
f (generic function with 1 method)
In [25]:
g = (x) -> 2*x
#5 (generic function with 1 method)
When we compare the two of them, the anonymous function saves all of the referenced data. The BSON library considers the first case as a “leaf” symbol that will be available in the namespace when we load at a later time, versus a deep data structure that needs to be traversed and stored.
In [69]:
# This is from my fork:
doc = BSON.@documentize(f)
doc[:f]
Dict{Symbol,Any} with 3 entries:
:tag => "struct"
:type => Dict{Symbol,Any}(:tag=>"datatype",:params=>Any[],:name=>Any["Main", …
:data => Any[]
In [86]:
doc = BSON.@documentize(g)
doc
Dict{Symbol,Any} with 2 entries:
:g => Dict{Symbol,Any}(:tag=>"struct",:type=>Dict{Symbol,Any}(:tag=>"…
:_backrefs => Any[Dict{Symbol,Any}(:tag=>"struct",:type=>Dict{Symbol,Any}(:ta…
Let’s try out a round trip of writing to a buffer:
In [74]:
buf = IOBuffer()
BSON.@save buf g
bufs=seek(buf, 0)
d = BSON.load(bufs)
Dict{Symbol,Any} with 1 entry:
:g => ##5#6()
In [76]:
d[:g](3)
6
The first hack: writing to a buffer with library #1 to loading it into library #2 to send to the database driver:
In [31]:
buf = IOBuffer()
BSON.@save buf g
bufs=seek(buf, 0)
k= Mongoc.read_bson(bufs)
push!(client2["mydb"]["collection_func"], k[1] )
Mongoc.InsertOneResult{Mongoc.BSONObjectId}(BSON("{ "insertedCount" : 1 }"), BSONObjectId("5cc6a4f3b589b4026e5fb433"))
(The glass-half-full way to think about this is translating from the cutting- edge pure-julia implementation into the C data structures that the database implementation provides.)
Now we can turn around and pull everything that we saved back:
In [72]:
c = collect(client2["mydb"]["collection_func"]);
g_doc = c[2] # I checked
BSON("{ "_id" : { "$oid" : "5cc6a4f3b589b4026e5fb433" }, "g" : { "tag" : "struct", "type" : { "tag" : "jl_anonymous", "params" : [ ], "typename" : { "tag" : "backref", "ref" : 1 } }, "data" : [ ] }, "_backrefs" : [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "TypeName" ] }, "data" : [ "1.1.0", { "tag" : "symbol", "name" : "##5#6" }, { "tag" : "svec", "data" : [ ] }, { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "Function" ] }, { "tag" : "svec", "data" : [ ] }, { "tag" : "svec", "data" : [ ] }, true, false, false, 0, [ { "tag" : "symbol", "name" : "#5" }, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "Method" ] }, "data" : [ { "tag" : "ref", "path" : [ "Main" ] }, { "tag" : "symbol", "name" : "#5" }, { "tag" : "symbol", "name" : "In[25]" }, 1, { "tag" : "datatype", "params" : [ { "tag" : "jl_anonymous", "params" : [ ], "typename" : { "tag" : "backref", "ref" : 1 } }, { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "Any" ] } ], "name" : [ "Main", "Core", "Tuple" ] }, { "tag" : "svec", "data" : [ ] }, null, 2, false, 0, { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "CodeInfo" ] }, "data" : [ [ { "tag" : "struct", "type" : { "tag" : "backref", "ref" : 2 }, "data" : [ { "tag" : "symbol", "name" : "call" }, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "GlobalRef" ] }, "data" : [ { "tag" : "ref", "path" : [ "Main" ] }, { "tag" : "symbol", "name" : "*" } ] }, 2, { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "SlotNumber" ] }, "data" : [ 2 ] } ] ] }, { "tag" : "struct", "type" : { "tag" : "backref", "ref" : 2 }, "data" : [ { "tag" : "symbol", "name" : "return" }, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "SSAValue" ] }, "data" : [ 1 ] } ] ] } ], { "tag" : "array", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "Int32" ] }, "size" : [ 2 ], "data" : { "$binary" : { "base64": "AQAAAAEAAAA=", "subType" : "00" } } }, null, 2, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "LineInfoNode" ] }, "data" : [ { "tag" : "ref", "path" : [ "Main" ] }, { "tag" : "symbol", "name" : "#5" }, { "tag" : "symbol", "name" : "In[25]" }, 1, 0 ] } ], { "$binary" : { "base64": "", "subType" : "00" } }, { "$binary" : { "base64": "AAA=", "subType" : "00" } }, [ { "tag" : "symbol", "name" : "#self#" }, { "tag" : "symbol", "name" : "x" } ], false, false, false, false ] } ] } ], 2, null ] ] }, { "tag" : "datatype", "params" : [ ], "name" : [ "Main", "Core", "Expr" ] } ] }")
The BSON library throws an error if some of the entries don’t have a Julia interpretation, so we strip these out:
In [103]:
strip_info(doc::Dict) = filter( kv->kv[1]!="_id", doc)
strip_info(doc::Mongoc.BSON) = Mongoc.BSON( strip_info(Dict(doc)) )
strip_info (generic function with 3 methods)
Now we do the opposite: create temporary buffer, write the Mongoc return to it, then load the symbols and expressions back into the namespace:
In [104]:
buf_read = IOBuffer()
g_doc_stripped = strip_info(g_doc)
Mongoc.write_bson(buf_read, g_doc_stripped )
buf_read_start = seek(buf_read,0)
BSON.@load buf_read_start g
And we can verify it:
In [105]:
g(5)
10
The real verification is to run these bottom cells on another computer, or at least a new session, and then run this code:
In [1]:
using Mongoc
using BSON
function write_symbol(symbol)
buf = IOBuffer()
BSON.@save buf symbol
bufs=seek(buf, 0)
k = Mongoc.read_bson(bufs)
end
strip_info(doc::Dict) = filter( kv->kv[1]!="_id", doc)
strip_info(doc::Mongoc.BSON) = Mongoc.BSON( strip_info(Dict(doc)) )
function load_symbol(g_doc::Mongoc.BSON)
g_doc_stripped = strip_info(g_doc)
buf_read = IOBuffer()
Mongoc.write_bson(buf_read, g_doc_stripped )
buf_read_start = seek(buf_read,0)
BSON.@load buf_read_start g
g
end
load_symbol (generic function with 1 method)
In [2]:
client2 = Mongoc.Client(
"mongodb+srv://infer:infer@codedump-pmluz.azure.mongodb.net/test?retryWrites=true")
c = collect(client2["mydb"]["collection_func"]);
g_doc = c[2] # I checked
g = load_symbol(g_doc)
g(7)
14
The user:password combination infer:infer is a public read-only account, so you could run this code yourself… if you trust me enough to download and execute arbitrary code, which you really shouldn’t. There are plenty of security holes with this type of paradigm. Modern web applications are constantly sending around Javascript code to be executed on your computer, but the browser has “some” notion of “security”. There is none here; arbitrary Julia code is loaded into your interpreter. A system using this type of code storage needs to carefully secure write access to the server.
]]>Chapter 1 of any machine learning book always has a short paragraph on linear regression.
Toolkits of the like of TensorFlow or Flux are useful in their ability to perform complex derivatives and orchestrate training algorithms on complex nonlinear and deep models. However, the focus nowadays is on optimizing the hottest neural network with mini-batched stochastic gradient descent, which misses the rest of the field of optimization. There are plenty more classes of models algorithms that this programming approach can help us perform and develop. It’s important that any automatic ML/Deep Learning package be able to derive these simple models. We can take the hard work out of deriving and implementing newer non-neural network models too.
A linear model is fit by: \begin{equation} f(x) = W basis(x) + b \end{equation} where “basis” is the set $basis(x)={x}$ is a linear model. We’ll be doing polynomials where $basis(p)(x)={x,x^2,…x^p}$. The solution is the well known linear least squares problem, \begin{equation} P = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T y \end{equation} where $X$ is a rectangular matrix depending on the basis set and data, $X_{ij} = basis_i(x_j)$.
We’re going to take a different approach: instead of starting with the least- squares equation and deriving the matrix equation, we’re going to define the code for a function and perform source code transformations on to get the program we need that performs the regression.
To get this to work, we have to use a bleeding edge branch of Zygote:
] add Zygote#mutate
I’m also using a macro for generating polynomial expansions, whose source can be found here: https://github.com/afqueiruga/AfqsJuliaUtil.jl/blob/master/AfqsJuliaUtil.jl
In [1]:
include("./AfqsJuliaUtil.jl")
using .AfqsJuliaUtil
First, let’s make a simple dataset with known expected values:
In [2]:
dat_x = collect(1:0.1:10);
dat_y = 0.1*dat_x.^2 .+ 5 .+ 0.15*rand(length(dat_x));
dat_x = reshape(dat_x,(1,:));
dat_y = reshape(dat_y,(1,:));
We use the arguments format for parameters, in which they get passed through the function, rather than the implicit parameters format. A model has three arguments: hyperparameters, parameters, and input features: \begin{equation} f_{hyper}(x;params) \end{equation} Conceptually, I prefer this way of thinking about it, but juggling the arguments is complicated for the model developer.
In [3]:
poly = AfqsJuliaUtil.@polynomial_function(1,3)
P0 = rand(4)
f(x,P) = P[1:3]'*poly(x).+P[4];
With this “simple” expression, we will use Flux and Zygote to optimize the values of the parameters. When we derive the original form for linear regression, we’re optimizing the squared error loss over all data points: \begin{equation} P = \arg \min_P \left| y-f(x;P)\right|_2 \end{equation}
In [4]:
using Flux, Zygote, Plots
loss(x,y, P) = Flux.mse(y,f(x,P))
loss (generic function with 1 method)
We check that it works:
In [5]:
loss(dat_x,dat_y, P0)
153643.90926297026
Finding the minimum is the same as finding a root, \begin{equation} \frac{\partial L}{\partial P} = 0 \end{equation} which can be solved with the following single step of Newton’s method: \begin{equation} \left[\frac{\partial^2 L}{\partial P^2}\right]\Delta P = -\frac{\partial L}{\partial P} \end{equation} So… let’s just type that!
In [6]:
grad_fwd(x) = Zygote.forward_jacobian( (P->loss(dat_x,dat_y, P)), x )[2];
hess_fwd(x) = Zygote.forward_jacobian( gradx_fwd, x );
Note: Ideally, we would want to backward differentiate the loss because it’s more efficient when $N_{outputs} \ll N_{inputs}$. I’m note sure which algorithm would be better for the derivative of the gradient; probably forward differentiation. However, Zygote still has trouble forward differentiating something that was backward differentiated:
In [7]:
#gradx(x) = Zygote.gradient( (P->loss(dat_x,dat_y, P)), x )[1]
#hessx_fwd(x) = Zygote.forward_jacobian( grad, x );
To get the matrix and right-hand-side, we evaluate it:
In [8]:
R,K=hess_fwd(P0)
([4385.54; 36711.5; 3.1605e5; 545.084], [74.3 560.45 4506.19 11.0; 560.45 4506.19 37738.1 74.3; 4506.19 37738.1 3.25071e5 560.45; 11.0 74.3 560.45 2.0])
Note how the gradient (R) is the same as the backward differentiated case.
To determine our parameters, we solve the resulting system of equations:
In [9]:
ΔP = K\R;
P = P0 - ΔP
4×1 Array{Float64,2}:
0.005753081858799192
0.09952045013819949
-1.7712223406740613e-5
5.067237452360866
Notice how we recover P[2]==0.1
and P[4]==5
. We can verify that it looks
right:
In [10]:
scatter(dat_x',dat_y')
plot!(dat_x',f(dat_x,P)')
Note how the entirety of linear regression was implemented in six lines of code, without sacrificing either expressibility or performance. I didn’t have to do anything at all by hand beyond expressing the program that implemented the polynomial expansion. To be fair, my basis implementation is a little out-there due to the macro usage to precompute the combinatorics.
]]>Do we need to derive equations by hand? Can’t our computer just solve it?
There are lots of ways to implement this kind of thought. We could employ symbolic differentiation on our mathematical relations, and then generate code from the symbolic expressions our algorithm needs. This is the strategy I used for Popcorn, where I really wanted to operate in Vector algebra and calculus. FEniCS is another example of this type of system. But this gets slow, and the workflow is complicated:
\begin{equation} f(x)\rightarrow Tool \left[\rightarrow \frac{\partial f(x)}{\partial x} \rightarrow expressions \rightarrow generate\,code \rightarrow compile \, and \, link\right]\rightarrow module \rightarrow user\,program \end{equation}
This gives us a three-language problem: the symbolic language used to express and manipulate the equation (Popcorn/SymPy, UFL, or TensorFlow), the low-level code that implements it efficiently (C/C++/CUDA), and the high-level code that embeds the langauge and orchestrates the workflow (Python). The user might not care what’s inside the blackbox, but I as the developer care. The user will care if the system is faster and easier to use.
If the user can type in some implementation of an equation, we can generate what we need directly from the program, if we have a sufficiently advanced and flexible programming language. That’s why this is a Julia notebook.
I’m using Zygote to do differentiation on Julia code.
In [1]:
using Zygote
using LinearAlgebra
using Plots
The simplest example I can think of is the pendulum with a spring penalty on its length. Let us use vector points $x=[x_1,x_2]$ and $v=[v_1,v_2]$ such that its Lagrangian is: \begin{equation} L = \frac{m}{2} v^2 + \frac{k}{2}\left( \sqrt{x^2}- L\right)^2 - mgx_2 \end{equation} which we can write as a one-liner function:
In [53]:
m = 1
k = 20
g = 9.81
L(x,v) = 1/2*m*dot(v,v)-1/2*k*(sqrt(dot(x,x))-1.0)^2 - m*g*x[2]
L (generic function with 1 method)
That’s all the “physicist” needs to specify, directly in Julia. To get the eqations of motion, physics students are taught the basic equation of Lagrangian mechanics: \begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial v} = \frac{\partial L}{\partial x} \end{equation} We can directly build the tools we need from the one-line computer program above:
In [60]:
dLdx(x,v) = Zygote.forward_jacobian(xd->L(xd,v),x)[2];
dLdv(x,v) = Zygote.forward_jacobian(vd->L(x,vd),v)[2];
We’re using the forward_jacobian
method because vector values cause problems
to the more advanced algorithms that are works-in-progress in Zygote. We can
check to see that the gradient with respect to $x$ extracts out the force due to
gravity when the spring is relaxed:
In [61]:
dLdx([1.0,0.0],[0.0,0.0])
2×1 Array{Float64,2}:
-0.0
-9.81
We can now implement a trapezoidal Runge-Kutta integrator (for energy- conservative A-stability) to discretize the time derivative as an implicit equation in $x$ and $v$:
In [102]:
aii = 0.5;
Δt = 0.1;
lhs(x,v) = dLdv(x,v) - Δt*aii * dLdx(x,v);
rhs(x,v) = dLdv(x,v) + Δt*(1.0-aii) * dLdx(x,v);
We will solve the equation \begin{equation} lhs(x_i,v_i) = rhs(x_0,v_0). \end{equation} Then, we take the pieces we need to solve the equation using Newton’s method. We need two K’s on each argument because we’re solving a second-order ODE by substituting $dx/dt=v$ into the system of equations.
In [103]:
fwd_Kx(x,v) = Zygote.forward_jacobian((xd)->lhs(xd,v),x)[2];
fwd_Kv(x,v) = Zygote.forward_jacobian((vd)->lhs(x,vd),v)[2];
fwd_K_tot(x,v) = fwd_Kv(x,v) + Δt * aii * fwd_Kx(x,v);
We can check to make sure this gives us an expected result too, a 2-by-2 matrix equal to \begin{equation} (m=1)\mathbf{I} + (\Delta t \, a_{ii} = 0.05)^2 (k=20) e_2 \otimes e_2 \end{equation}
In [106]:
fwd_K_tot([0.0,1.0],[0.,0.])
2×2 Array{Float64,2}:
1.0 0.0
0.0 1.05
Now we just do Newton’s method for each timestep to integrate forwards in time:
In [104]:
x0 = [1.0,0.0];
v0 = [0.0,0.0];
series = []
for i = 1:100
rhs0 = rhs(x0,v0)
xi = x0
vi = v0
for k = 1:10
R = rhs0-lhs(xi,vi)
Kt = fwd_K_tot(xi,vi)
Δv = Kt\R
vi = vi + Δv
xi = x0 + Δt*((1.0-aii)*v0 + aii*vi)
#println(k,Δv,R,Kt);
if dot(Δv,Δv)<1.0e-14 break end
end
push!(series,(xi,vi))
v0 = vi
x0 = xi
end
We can now plot the $x(t)$ and $y(t)$ curves to make sure they look right:
In [105]:
plot([s[1][1] for s in series],label='x')
plot!([s[1][2] for s in series],label='y')
Note how we’re solving the nonlinear pendulum. Let’s also plot it in $y(x)$:
In [97]:
plot([s[1][1] for s in series],[s[1][2] for s in series],marker=:hexagon,label="")
We started from just a one-liner expression of the lagrangian $L(x,v)$ in direct Julia code, and used differentiation on the program itself. Not only did we let the programming language do the heavy-lifting in deriving expressions, this is also waaayyy faster than an equivalent pure-Python implementation since Julia is a compiled language. (It’s also way faster than a TensorFlow implementation; speeding up some other TensorFlow projects is the motivation for this.)
The full capabilities of Zygote only handle scalar returns yet. I should dig into this.
In [98]:
Kx(x,v) = Zygote.derivative((xd)->lhs(xd,v),x)
Kv(x,v) = Zygote.derivative((vd)->lhs(x,vd),v)
K_tot(x,v) = Kv(x,v) + Δt * Kx(x,v)
K_tot (generic function with 1 method)
In [None]:
K_tot([0.0,0.0],[0.0,0.0])
KERNEL EXCEPTION
UndefVarError: S not defined
Stacktrace:
[1] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type{Zygote.Pullback{Tuple{Type{UnionAll},TypeVar,Type{Type{#s4<:Tuple}}},T} where T}) at /Users/afq/.julia/packages/Zygote/mlF4T/src/compiler/show.jl:12
[2] show_datatype(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::DataType) at ./show.jl:526
[3] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::DataType) at ./show.jl:436
[4] print(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type) at ./strings/io.jl:31
[5] print(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::String, ::Type, ::Vararg{Any,N} where N) at ./strings/io.jl:42
[6] (::getfield(Base, Symbol("##372#373")))(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}) at ./show.jl:1481
[7] #with_output_color#671(::Bool, ::Function, ::Function, ::Symbol, ::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}) at ./util.jl:366
[8] with_output_color(::Function, ::Symbol, ::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}) at ./util.jl:364
[9] show_tuple_as_call(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Symbol, ::Type) at ./show.jl:1470
[10] show_spec_linfo(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Base.StackTraces.StackFrame) at ./stacktraces.jl:262
[11] #show#9(::Bool, ::Function, ::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Base.StackTraces.StackFrame) at ./stacktraces.jl:272
[12] #show at ./none:0 [inlined]
[13] #show_trace_entry#641(::String, ::Function, ::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Base.StackTraces.StackFrame, ::Int64) at ./errorshow.jl:479
[14] (::getfield(Base, Symbol("#kw##show_trace_entry")))(::NamedTuple{(:prefix,),Tuple{String}}, ::typeof(Base.show_trace_entry), ::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Base.StackTraces.StackFrame, ::Int64) at ./none:0
[15] show_backtrace(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Array{Union{Ptr{Nothing}, InterpreterIP},1}) at ./errorshow.jl:582
[16] show_bt(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Symbol, ::Array{Union{Ptr{Nothing}, InterpreterIP},1}, ::UnitRange{Int64}) at /Users/afq/.julia/packages/IJulia/9ajf8/src/display.jl:136
[17] #sprint#340(::Nothing, ::Int64, ::Function, ::Function, ::Symbol, ::Vararg{Any,N} where N) at ./strings/io.jl:101
[18] sprint at ./strings/io.jl:97 [inlined]
[19] #error_content#34(::Symbol, ::String, ::Function, ::UndefVarError, ::Array{Union{Ptr{Nothing}, InterpreterIP},1}) at /Users/afq/.julia/packages/IJulia/9ajf8/src/display.jl:147
[20] error_content(::UndefVarError, ::Array{Union{Ptr{Nothing}, InterpreterIP},1}) at /Users/afq/.julia/packages/IJulia/9ajf8/src/display.jl:147
[21] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/afq/.julia/packages/IJulia/9ajf8/src/execute_request.jl:138
[22] #invokelatest#1 at ./essentials.jl:742 [inlined]
[23] invokelatest at ./essentials.jl:741 [inlined]
[24] eventloop(::ZMQ.Socket) at /Users/afq/.julia/packages/IJulia/9ajf8/src/eventloop.jl:8
[25] (::getfield(IJulia, Symbol("##15#18")))() at ./task.jl:259
In [None]:
The quirky part about the unittest framework is that the programmer leaves behind class definitions which are detected and then initialized by the framework. That works fine for the most part, but then how do you loop to populate the framework? What if I’m trying to write a library that makes a unittest framework? Well, this is what I’m trying to do — test generation software.
I found two methods online:
setattr
: It’s simple and works, but, for making a testing library, the user’s view of it is kludgy:
generated_tests = [
mytestlib.maketestfunc(),
mytestlib.maketestfunc(),
]
class UsersTestCase(unittest.TestCase):
pass
mytestlib.fill_using_setattr(UsersTestCase, generated_tests)
__metaclass__
: It looks more elegant, but I couldn’t get the method to work in Python 3, only in Python 2.
I don’t think it could be hidden from the user either.I played around with a few things after reading everyone’s own solutions on the internet, and I think I’ve come up with an elegant solution.
type()
.The method is to directly call type
with three arguments.
The following code blocks in this section make one working example.
The class generator is quite simple:
import unittest as ut
def make_testcase(suite):
return type('MyTestCase', # The class name
(ut.TestCase,), # Inherit only TestCase
{fn.name : fn for fn in suite}) # Generate a dictionary
Simple! Pythonic! The library now has to define a method for generating the class methods, lest the user finds this all worthless. The toy example here lets the users provide two numbers to test if they multiply to 100:
def MyTestFunctionGenerator(a,b):
"""Generates class methods for a unittest object"""
def fn(self):
"""Performs one test"""
self.assertTrue( a*b == 100 )
fn.name = "test_{0}_{1}".format(a,b)
return fn
The above two methods can be hidden inside of a library, with a from awesometestinglibrary import *
.
The user can just instantiate a list of class methods for everything that
needs to be tested:
test_list = [
MyTestFunctionGenerator(50,2),
MyTestFunctionGenerator(5,20),
MyTestFunctionGenerator(-50,-2),
MyTestFunctionGenerator(10,11), # Fails intentionally
]
and then the user can call the first library function to get a unittest compatible class specification:
MyTestCase = make_testcase(test_list)
Note that the returned type object must be assigned to a variable, or else the unittest framework won’t discover it. But I think that’s the only caveat to the end user!
How this looks to the end-user in the testing library I’m developing is:
import HydrogeologyTest as hgtest
# import scripts for myUniaxial, myShear, myTerzaghi
suite = [
hgtest.ExactTestRunner(hgtest.oracles.Uniaxial, myUniaxial),
hgtest.ExactTestRunner(hgtest.oracles.Shear, myShear),
hgtest.ConvergenceTestRunner(hgtest.oracles.Terzaghi, myTerzaghi, 1),
]
TestSuite = hgtest.make_suite(suite)
and that’s it! The user just has to define Python functions that runs their code for the specified test problem and returns its results, and hgtest
takes care of the rest and generates a unittest framework.
Each of those items in the suite
list are complicated classes that are asynchronously scheduling a batch run of expensive simulations.
Each class compares the user’s code against a library of known oracles from analytical solutions, or reference.
There are further hidden options for tuning how thorough the testing needs to be done for a given run. (It can get expensive!)
The function generator from the working example makes class methods with a self
argument directly. In the implementation which I’m working on now, I start with a list of functions that return True
/False
instead of directly call self.assertTrue
. There I have another routine that looks like:
def make_testcase_classmethod(simplefunc):
def fn(self):
return self.assertTrue(simplefunc())
return fn
which just wraps up a non-TestCase function. That keeps it simple so that unittest-specific code is only in one file, and leaves it extendable to be wrapped by other test frameworks.
It seems simple, and like there should be a built-in feature. I couldn’t find any formula when I googled, so I derived it myself and post it in hopes that whoever is reading it managed to find it. Future me will probably consult this page.
We want the ratio between the first division and it’s arc-length to be same as the last division and it’s arc-length. The arc-lengths are $L_1=2\pi R_1 \theta$ and $L_2=2\pi R_2 \theta$, so their ratio is the same is as the ratio between the radii. We want to find a relationship between $h_1$ and $h_2$ that will yield the same ratio. $h_1=a$ and $h_2=a r^N$
After forgetting everything I learned in middle school and looking it up on wikipedia, the formula for the total distance is
\begin{equation} R_2-R_1 = a \frac{1-r^N}{1-r} \end{equation}
GMSH will solve for the $a$ given an $N$. We need to provide the $r$ that satisifies \begin{equation} \text{Find}\quad r \quad \text{s.t.} \quad \frac{a}{a r^{N-1} } = \frac{L_2}{L_1} \end{equation} which comes out to \begin{equation} r=\left( \frac{R_2}{R_1} \right)^{1/(N-1)} \end{equation}
Here is the line where I use this in formula in a gmsh script:
// The radial edges
Transfinite Line {7, -8} = 18/Mesh.CharacteristicLengthFactor
Using Progression (2.5/0.2)^(Mesh.CharacteristicLengthFactor/(17.0-1.0));
// The arc edges
Transfinite Line {1, 6} = 12/Mesh.CharacteristicLengthFactor Using Progression 1.0;
// Set the mode
Transfinite Surface {5} Alternate;
7 and 8 are the line segments with define the radially oriented sides, and segments 1 and 6 are the arcs. The arcs use a uniform progression, which means the meshed segments of 1 and 6 will have the different lengths discussed above. The section was defined to go from $R_1=0.2$ to $R_2=2.5$. The base case has 17 elements along the radius and 12 elements around the arc. The -8 is there to flip the line so that the progression goes in the same direction as 7; this needs to be verified by looking at the GUI.
This code will also automatically scale the number of elements based on the size factor. This is useful when scaling the mesh from the command line. Here’s the little snippet where a python script generates a new mesh for a convergence study:
os.system("gmsh ../quarter_plane.geo -2 -clscale {0} -o ../gen_{0}.msh ".format(clscale))
os.system("dolfin-convert ../gen_{0}.msh ../gen_{0}.xml".format(clscale))
Chorin’s method is one of the standard ways of solving incompressible Navier Stokes. The basic idea is to explicitly compute a trial step for the velocity, then compute the pressure that corrects the divergence of the velocity, and finally apply that pressure gradient to copmute the final velocity. The FEniCS tutorial explains it well with an FEM implementation. However, the standard phrasing of the method assumes using a forward Euler time discretization, $\partial_t u \approx (u-u _0)/\Delta t$. It’s good enough for solving Navier Stokes efficiently, but what if we are trying to couple the fluid with an ODE that has different stability requirements?
How do we rephrase the method as a general Runge Kutta?
The partial differential algebraic equation for Navier Stokes is \begin{equation} \partial_t u + \nabla u \cdot u - \mathrm{Re}^{-1} \nabla^2 u + \nabla p = f \end{equation} subject to the constraint \begin{equation} \nabla\cdot u = 0. \end{equation} The standard derivation of Chorin’s method shoves in the forward Euler approximation for the time rate immediately. I like to do that as the final step to make it easier to cycle through time steppers.
Let us define a forcing term $r$: \begin{equation} r = f - \nabla u\cdot u + \mathrm{Re}^{-1} \nabla \cdot \nabla u. \end{equation} This lets us rewrite the ODE as \begin{equation} \partial _t u = r - \nabla p \end{equation} which is still subject to the same constraint as before. We’re allowed to take the time derivative of the constraint, and exchanging with the spatial divergence gives us a new equation, \begin{equation} \partial _t \nabla \cdot u = 0 \quad \rightarrow \quad \nabla \cdot \partial _t u = 0 \quad \rightarrow \quad \nabla \cdot (r-\nabla p) = 0 \end{equation} We can use this equation to solve for $p$ initially for $r$, and then apply our time integrator to $\partial _t u$.
Now we want to phrase this in a variational formulation to weaken the derivations to apply the finite element method. The key to this reconsideration is that we recast the trial velocity step as a projection of the equation for $r$ onto the velocity basis functions. Let $v,\delta v \in \mathcal{V}$ be functions and test functions in a suitable space for velocities, and $p,\delta p \in \mathcal{P}$ be the same for the pressures.
How do we phrase this as an explicit integrator?
The process is phrased as a projection step as an optimization; it would be possible to skip 1 and have steps 2 and 3 reperform the calculations inside of two different element assembly routines. However, the force coupling term $(\delta u,f)$ may be a very complicated interaction between two different numerical methods that we do not want to repeat. For example, as I’m writing this I’m implementing a particle-fluid code where that term is a discrete summation of point-integrals using my FEniCS_Particles library.
We use the standard Taylor-Hood elements (quadratic velocities and linear pressures). The forms needed to perform the above steps look as follows in the FEniCS UFL:
f_v_M = inner(tu,Du)*dx
f_r_proj = - inner(tu,dot(grad(u),u))*dx \
- mu*inner(grad(tu),grad(u))*dx
f_p_K = inner(grad(tp),grad(Dp))*dx
f_p_r = inner(tp,-div(r))*dx
f_v_dot = inner(tu, r - grad(p))*dx
This projection step requires us to implement a new class. We merge the step into the implicit pressure
class RK_field_chorin_pressure(RK_field_fenics):
def __init__(self, r, p, f_v_M, f_r_proj, f_p_K, f_p_R, bcs=None, **kwargs):
self.r, self.p = r, p
self.f_r_proj = f_r_proj
self.f_p_R = f_p_R
self.bcs = bcs
self.K_p = assemble(f_p_K)
self.M_v = assemble(f_v_M)
pyrk.RKbase.RK_field_dolfin.__init__(self, 0, [p.vector()], None, **kwargs)
def sys(self,time,tang=False):
solve(self.M_v, self.r.vector(), assemble(self.f_r_proj))
R = assemble(self.f_p_R) + self.K_p*self.p.vector()
return [R, self.K_p] if tang else R
The big addition to this class from the standard RK_field_fenics
class is that it performs the $r$ projection step at the first line of
sys()
before return $\mathbf{R}$ and $\mathbf{K}$.
Yes, there’s some weird inheritance going on where I extend a parent
but use the grandparent’s constructor.
We can then initialize this special RK_field
class and the standard RK_field_fenics
class to have two modules representing the pressure prediction and the velocity ODE, and insert them into an explicit Runge Kutta time stepper object with any tableau we desire:
rkf_p = RK_field_chorin_pressure(r,p,f_v_M, f_r_proj, f_p_K,f_p_r,
bcs_p)
rkf_p.maxnewt = 1
rkf_v = RK_field_fenics(1, [ u ], f_v_M, f_v_dot, [], bcs_v )
Tfinal = 0.01
DeltaT = Tfinal/100.0
step = pyrk.exRK.exRK(DeltaT, pyrk.exRK.exRK_table['RK4'], [rkf_p, rkf_v] )
Setting the maxnewt
field to one tells the stepper class that the
pressure step is linear; it assumes everything is nonlinear.
The full implementation is located at
https://github.com/afqueiruga/chorin_rk. It
requires
afqsrungekutta and afqsfenicsutil in the $pythonpath
.
We use the lid driven cavity as a standard test problem. There are no analytical solutions, but it’s been studied to death as a benchmark because predictable vortices arise. Reference solutions can be found in “Numerical Simulation in Fluid Dynamics” by Griebel, Dornsheifer, and Neunhoeffer. Let $w$ and $h$ denote the domain size, which are both set to 1. A steady state solution we computed using RK4 with $Re=1/100$ is below:
To determine converge accuracy, the pressure and velocity fields are probed at $(0,h/4)$ where the origin is the center of the cavity. We already know that Taylor-Hood elements work, so we only use one $40\times40$ mesh for the study and only vary the time step.
]]>I have released the source code of my new scientific packages, cornflakes and popcorn. They are located in Bitbucket repositories at
This package has been under development for the last two years as I’ve developed a variety of numerical codes. It is finally at the point where I am comfortable putting it out in the wild. It is still not quite ready for actual usage by people other than myself, but it is suitable as a case study in scientific package architecture.
The design goal for cornflakes/popcorn is to become a tool for implementing new numerical methods and the domain specific languages for them. The popcorn DSL is a transpiler for symbolic expressions, and cornflakes is an implementation of a general purpose map-assemble operator with supporting code for building hypergraph representations of problems.
The names are puns on “kernel”: Cornflakes is the serial assembly of kernels. Popcorn transforms compact kernel specifications in large and fluffy C code implementations. Husks are filled with kernels. (Cornflakes will run in parallel though, but I still liked the pun.)
In the design philosopy of cornflakes, many high performance scientific programs all share a similar characteristic:
For a finite element method program, the calculation of the local element matrix is the kernel that is the smallest unit of computational work to be distributed across processing nodes. Developing algorithms and working code on both sides of the program is a challenge.
Higher order numerical algorithms for both temporal discretizations, such as many-stage implicit Runge-Kutta methods, and spatial discretizations, such as high order finite element basis functions, are, in the author’s opinion, underused owing to the great difficulties in their implementation. A major barrier to using the more complex numerical schemes is the generation of the tangent matrix: that is, the $\mathbf{K}$ in the problem $\mathbf{K}\mathbf{u}=\mathbf{f}$. Developing the form of the matrix is manageable for linear problems, though quickly becomes difficult for nonlinear fully-coupled multiphysics problems. These types of problems are typically described mathematically as either minimization problems on a Lagrangian or some other expression of a potential, \begin{equation} \min_{\mathbf{u}}\Pi\left(\mathbf{u}\right), \end{equation} or as nonlinear systems of equations, \begin{equation} \mathbf{f}\left(\mathbf{u}\right)=0. \end{equation} Solving these problems statically or with an implicit time stepping method usually centers around linearizing the functions and employing Newton’s method, or some variant thereof, and iterating over a series of linear systems, \begin{equation} \mathbf{f}\left(\mathbf{u}^{I}\right)+\left.\frac{\partial\mathbf{f}}{\partial\mathbf{u}}\right|_{\mathbf{u}^{I}}\Delta\mathbf{u}^{I} = 0, \end{equation} with $\mathbf{f}=\frac{\partial\Pi}{\partial\mathbf{u}}$ if the problem was originally expressed with a potential. Even neglecting the effort required to write the code itself, it can require many weeks of pencil-and-paper work manipulating the mathematical expressions to produce a linearized equation.
The Readme.md in the cornlakes repository explains the hypergraph and map-assemble abstraction in great detail. In the rest of this post, I talk about a very complicated example that motivated its development, and then some software design considerations I am still mulling over.
See the example notebook in the repository, /examples/spring_example.ipynb
Cornflakes the main library for a number of different of codes I write at LBNL. The following publications and conference presentations are a small selection. The poster has a good description of the algorithm that was possible using cornflakes.
Consider the follow diagram of the simulation for hydraulic fracture extension described in the above poster:
It involves three different types of discretizations for three different sets of physics:
that are fully coupled. A visualization of the simulation is included in the research gallery on this slide, and the poster linked above describes the algorithm. How does one even program this? Well, it took me less than a year because, in that time, I wrote my own langauge and runtime to express it with! Using cornflakes, there are three classes of hypervertices in this problem:
That model overview can be decomposed into the following (hand drawn) kernels:
The schematic illustrates the physical meaning of the edge, and the bottom list shows the ordering of the hypervertices inside the hyperedge that corresponds to one kernel call. From left to right, top to bottom, these are
The vertex ids are just integers, with $P$, $B$, and $Q$ just denoting different ranges. E.g., if there are 100
peridynamics points, 800 bonds, and 40 fem nodes, the last vertex has the id 939. The ranges would be 0-99 for $P$,
100-899 for $B$, and 900-939 for $Q$. The labels mean absolutely nothing to cornflakes, but we use these types
of schematics to figure out what the DofSpace
s and DofMap
s need to look like to fetch the data we need for
each kernel. Note that the first and last kernel are variable length! The edges of a hypergraph don’t require the
same length, and popcorn kernels can take in variable length arguments, given an l_edge
parameter, and the
DSL can express Loops over symbolic ranges.
I chose Python/C because it was, and still is, the state of the art when I started writing. At the time I began, I felt that Julia wasn’t quite ready. Originally, I wanted cornflakes to have a pure C API that didn’t require Python to make it easy to link to preexisting code. Julia requires the runtime and doesn’t support outputting linkable objects, but I have abandoned that decision and am okay with that now. The latest version of TOUGH+ has an embedded Python interpreter that executes the Python/C-based mechanics library
Julia also has a very good macro system, enabling manipulation of the AST. I want to implement Popcorn inside of Julia entirely, instead of the Python–>C generation scheme. However, Julia doesn’t have its own symbolic library, and the Python interface was wonky when I experimented with it. I think it would be possible to use Sympy inside of Julia now, but there are still two languages. Sympy is slow for some of my calculations, too. Maybe a new symbolic toolkit in Julia that leverages its JIT could be blazingly fast.
I wanted a pure C API at first, but this gets out of hand quickly. Just embed a higher level language when you need to interact with legacy codes. It’s easier than expressing complicated simulations in pure C or Fortran.
At first I wanted a serial and a parallel implementation of the runtime; i.e. one for interactive runs in IPython on a laptop and one for HPC systems. (Hence the “cereal” pun for cornflakes. The parallel version was going to be called cornfield or thresher or some other pun involving many stalks of corn.) However, I think it’s best to only have one version. Requiring the user to install PETSc for a laptop system also can complicate matters. Distributing Docker images solves the problem of requiring HPC libraries, so it may be viable to require a PETSc backend. However, I am still conflicted on how to manage the type system to make the Numpy/Scipy types transparent, but still wrap parallel data structures. I think the Julia array interface would solve this, but that may be wishful thinking.
I used SWIG as the Python/C binding since it’s worked well enough for me in the past. I probably won’t use it again. This has caused tons of headaches with allocation tracking and memory leaks. I even had to make direct calls to the Python API.
If you look carefully at the C source, you’ll notice that I hand-coded my own polymorphic object system for
cfdata_t
, cfmat_t
, and dofmap_t
.
Don’t do this! This is bad practice! I’m a crazy person!
I would never use an obscure practice in a codebase with multiple authors.
General purpose software should avoid using motifs only familiar to low-level C programmers to remain accessible to
the users; a cryptic implementation won’t be educational to an end user trying to learn more about the software design.
I just really hate the C++ class system, but that’s another discussion about language design.
(My latest C++ code has a hacked vtable, too, for virtual template methods.)
I really like the Julia type system, which is another motivator for switching.
Some of the blame goes to my colleague Jeff Johnson, who may have been a bad influence on that. (He points out that this paradigm is indeed very common, to which I counter that, unlike us, most fellow scientists don’t spend their free time reading the Linux kernel source for inspiration.) I jest; I thank him dearly for our discussions on code architecture for these types of packages. Cornflakes would have been much messier without his advice.
Besides the excellent advice from Jeff mentioned above, there were a number of important inputs to my line of tought. The FEniCS project is a central inspiration to this work. My frequent discussions (or arguments) about DSLs with Daniel Driver, author of “Dan++”, yielding many design decisions to cornflakes. I also acknowledge the inspiration of Per-Olof Persson, whose one line in a lecture on Runge-Kuttas six years ago—“You just implement $\mathbf{u}$ as a pointer to data”—completely changed my view of what the right data structures should be.
Development support for this language was provided while addressing the needs of multiple projects at Lawrence Berkeley National Lab, including those mentioned above.
Cornflakes was designed as a new way to express parallelism, but I still haven’t done it.
The unstructured hypergraph partitioning algorithm has been implemented and tested in another development code
using PETSc, but hasn’t made its way into cornflakes yet.
I am still debating on the major type system changes I discussed above before parallelizing cornflakes.
A deprecated implementation is in the source code for OpenMP threading of Assemble
.
The popcorn specification should also be able to generate code for vectorized CPUs and GPUs quite easily.
I will be soon adding a more complex example of a family of meshless methods (Moving Least Squares and the Reproducing Kernel Particle Method) to the cornflakes repository. I am also preparing an open source Peridynamics solver based on cornflakes to be released ahead of some upcoming conference presentations.
I’ll probably be rewriting it all in Julia at some point.
]]>After going to the NY MoMA, I was inspired by their current exhibit (c. Dec 2017) on computer generated artwork dating back to the 70s. That’s something I used to be really interested in, as I learned to program from a book on chaos and fractals. Thinking about what modern computer generated artwork would be liked, I was reminded of reverse gradient searches. I was interested in computer labeling of artwork a few years ago, but, fortunately, this time I managed to find a labeled dataset of paintings.
The Pandora data set can be found here: http://imag.pub.ro/pandora/pandora_download.html The following papers were published by the researchers about the data set and using it to try to label artwork.
At best they achieved about 45% accuracy.
Other papers of interest:
In [21]:
import numpy as np
import scipy.io as sio
from matplotlib import pylab as plt
%matplotlib inline
import tensorflow as tf
Heads up: if you’re running inside of a docker image, you have to mount where you downloaded the data too! I spent five minutes debugging this line until I realized that the vm couldn’t access my ~/Downloads directory >.< My image run command is
docker run -it -v `pwd`:/notebooks -v ~/Downloads:/data -v
~/Documents/Research/Misc/models/:/models -p 8888:8888
gcr.io/tensorflow/tensorflow
Now lets look at some of them.
In [6]:
base = "/data/"
mat = sio.loadmat(base+'Pandora18k_descripts/pandora18kMain_v2.mat')
In [26]:
mat['imageListFile'][2000][0][0][0]
(array([u'03_Northern_Renaissance\\hieronymus-bosch\\tempt-o1.jpg'],
dtype='<U53'), array([[2]], dtype=uint8), array([[3]], dtype=uint8))
That’s not very interesting data. The documentation doesn’t even state what this file itself is supposed to be. Perhaps artist and class keys? The other mat files have the preprocessed image properties that are better described in the papers.
Let’s look at a few random images:
In [64]:
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
from PIL import Image
TEST_IMAGE_PATHS = [
'06_Rococo/Unidentified_artists/11635.jpg',
'15_Surrealism/Salvador dali/Salvador Dali-26.jpg',
'14_Cubism/Weber_Marc/the-visit.jpg',
'14_Cubism/Picasso_Pablo/cafe-royan-1940.jpg',
'08_Realism/tom-scott/horse-1890.jpg',
'08_Realism/boris-kustodiev/nude.jpg'
]
for image_path in TEST_IMAGE_PATHS:
image = Image.open(base+'/Pandora_18k/'+image_path)
image_np = load_image_into_numpy_array(image)
plt.figure(figsize=(3,2))
plt.imshow(image_np)
plt.show()
The studies that developed the labeled database used different processed image properties, like color historgrams etc. They didn’t get very good classification results. One thing they mentioned was not having enough data to train a DNN. Which is true. However, the meaning of artwork is more than just the corpus artwork itself. A painting is a representation of something that humans interpret based on the rest of the world. A crucial descriptor of a piece of artwork is its subject. In humans, artwork evokes references from the rest of the world. There will never be enough data in a corpus of paintings to train subject matter. To interpret artwork, our model needs more than just the image properties; we need to include an object classifier that was trained on a larger body of real-world data.
Apparently that’s easy now! Following this blog I loaded up the object recognition model from the tensorflow models repository
In [9]:
import os
import sys
sys.path.append("/models/research")
In [7]:
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
This is how we download the model from the internet and load the labels from the locally mounted source directory:
In [29]:
CWD_PATH = '/models/research/'
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
#PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
#PATH_TO_CKPT = os.path.join(CWD_PATH, 'object_detection', MODEL_NAME, 'frozen_inference_graph.pb')
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join(CWD_PATH, 'object_detection', 'data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES,
use_display_name=True)
category_index = label_map_util.create_category_index(categories)
import six.moves.urllib as urllib
import tarfile
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
A wrapper for applying the modeling to an image:
In [49]:
def detect_objects(image_np, sess, detection_graph, threshhold=0.5):
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
min_score_thresh=threshhold,
line_thickness=8)
return image_np
Load the graph from the expanded directory:
In [45]:
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
And make a new session and apply the object detection graph to our list of images and look at the results:
In [68]:
with tf.Session(graph=detection_graph) as sess:
for image_path in TEST_IMAGE_PATHS:
image = Image.open(base+'/Pandora_18k/'+image_path)
image_np = load_image_into_numpy_array(image)
detimg = detect_objects(image_np,sess,detection_graph)
plt.figure(figsize=(4,3))
plt.imshow(detimg)
Looks cool. But, someone of the paintings aren’t marked at all and there are definitely other features in all of them that we aren’t seeing. Let’s change the threshold and see what comes up:
In [66]:
image_path = TEST_IMAGE_PATHS[2]
with tf.Session(graph=detection_graph) as sess:
for thresh in [0.12,0.1,0.0001]:
image = Image.open(base+'/Pandora_18k/'+image_path)
image_np = load_image_into_numpy_array(image)
detimg = detect_objects(image_np,sess,detection_graph,thresh)
plt.figure(figsize=(4,3))
plt.imshow(detimg)
Now that looks pretty cool. In this cubist painting, we see a few misidentifications and very low certainties. I hypothesize this a characterestic that a model could learn. (Low certainties, a wide distribution of weird objects.)
Let’s look at all of them with a very low threshhold:
In [61]:
with tf.Session(graph=detection_graph) as sess:
for image_path in TEST_IMAGE_PATHS:
image = Image.open(base+'/Pandora_18k/'+image_path)
image_np = load_image_into_numpy_array(image)
detimg = detect_objects(image_np,sess,detection_graph,0.1)
plt.figure(figsize=(12,8))
plt.imshow(detimg)
The spatial distribution of objects also is an indicator of composition. E.g., look at the balance of identified objects in the first painting of the Rococo.
Now I need to figure out how to encode these results as input to a model.
]]>