I wanted to start from scratch, but didn’t want to go with the standard feed readers (feedly, inoreader, etc). They are good but I find them a little to feature-heavy. I really wanted a minimalistic feed reader that was designed for reading on mobile platforms (which is what I will primarily use). I also wanted it to be self-hosted so that I could have more control over the data and the configuration.
After a bit of a search^{1}^{2}, I went with miniflux – open-source, self-hosted, works well in mobile platforms, and opinionated in a way that works for me :smile:
I went ahead and set it up with heroku. It was very straightforward – I basically followed the steps in this page. It didn’t take only a few seconds (as mentioned in the guide) – more like a few minutes, but really, I was quite surprised at how fast I was able to set everything up!
I have been reading more and more feeds on my phone and the experience already has been way better than with other readers. I will keep updating this page based on the tweaks I make. Try it out – I am quite happy with it.
]]>That said, I quite liked this book. To me, it quite effectively conveyed the awe-inspiring characteristics of the octopus as well as the seeming alien-ness of life in the ocean^{1}. And it sufficiently inspired me to add snorkeling and scuba-diving to my bucket-list of things to learn and areas to grow in, so I would chalk this up as a good book to read.
Footnotes:
My Octopus Teacher is a good visual accompaniment to this book. The author of the book and Craig Foster (the host) resonate quite a bit in their feelings, which made me even more excited about the whole thing. ↩
Google analytics, the most commonly used, was ruled out under the privacy-conscious rule ^{1}. After searching for quite a bit, I narrowed down to Panelbear, which provides analytics without cookies and respects the privacy of visitors. I used Panelbear for a few months - it is quite nice. So if you want something simple, free, and privacy-friendly, go with Panelbear.
Panelbear does have one inherent limitation. Although it is free for small websites, it only retains your visit data for 30 days. So after that, your data is lost, unless you pay for one of their plans. I wanted more control, so I started looking for some open-source solutions that I could preferentially self-host. That is when I hit upon umami.
Umami is quite similar to panelbear - it checks all the criteria. You can even self-host it using Heroku’s free tier plan. The steps are well-documented in the docs of umami so I mostly just followed them. The documentation can be found here. Some of the steps were tricky/inconvenient, so I made some slight modifications to the steps. Here they are, in case you (or the future me) decides to set up umami (again):
Create an account in Heroku and create an app.
Create a database. Umami requires a database in which it can log visits. You can create that in Heroku (for free!). Go to the resources page and instal thel Heroku Postgres addon.
Fork the Umami Github repository.
Next, instead of connecting to Github (Heroku required permissions to all my repositories), I used Heroku-cli. I created a repository in Heroku, cloned in onto my computer,and added the Github fork as my upstream master. That way, I could now pull from the Github repository and push it onto Heroku.
git remote add upstream #your git url
git remote -v #check if it is added
git fetch upstream git merge upstream/master #merge with the github repository version
git push heroku master #send the merged repository to the heroku remote
Now, if you had pushed the forked umami repo into heroku, added the hash salt (in the docs) and deployed, you should have been able to access the login page of umami. But when you try to login using the default username and password (mentioned here), it will give an error. That is because the database has not yet been initialised and the default username and password has not yet been added into it.
Now come the tricky few steps - first, setting up the database. Start by opening up the Heroku postgres database and going to settings. Click view database credentials and copy the host, database, username, and password into a temporary text file.
Now install the necessary packages that are needed to build the umami cloned repository in your computer. It requires nodejs and I used postgresql along with it (in arch linux, you can use pacman).
Go to the cloned repository and run:
npm install
It is finally time to initiate the database. Use the copied database credentials and run the following command. This will create the login and the password for your umami instance.
psql -h hostname -U username -d databasename -f sql/schema.postgresql.sql
To test if it works properly, add a .env file in the folder with a database url and a hash salt (details here). I used a 32-character random string for the hash salt (used a password generator for this).
DATABASE_URL=postgresql://username:mypassword@localhost:5432/mydb
HASH_SALT=random string
Now you can build, start and check if the local umami instance is using the database properly (i.e. if you are able to log in), using the following commands
npm run build
npm start
If you are able to log into the local umami instance, then you are done. Commit the changes (.env file) and push it into Heroku (it should be automatically deployed). Now you should be able to log in, change password and get cracking^{2}.
Footnotes
]]>Before I present the answer I gave my friend, let me explicitly define what I mean by “discover” and “invent”. If patterns and rules already exist, as a consequence of the property of the world we live in, then we humans “discover” the patterns. Examples include discovering fire, laws of motion, theory of evolution, etc. If something doesn’t exist and is brought to this world as a result of human imagination and ingenuity, then humans “invented” it. Examples include the steam engine, telegraph, telephone, computer, etc.
So, in this context, was mathematics “discovered” or “invented”? My answer was this: “I feel that math is invented. It is a formal language to describe and “discover” patterns in the world. That is, rules and patterns already exist (laws of motion, theory of evolution, etc.), and mathematics is just one of the tools we invented to find discover these rules”. The answer I provided made sense to my friend and he agreed. However, I realised that I was missing an alternative perspective as the answer sounded obvious to me. I began to search for this other perspective, which led me to several interesting articles and videos.
Let me start by fleshing out the question - did we “invent” mathematics to better describe the existing patterns in the universe or did we simply “discover” mathematics, which is the language of the universe^{1}?
Eugene Wigner, who received a Nobel Prize in Physics in the ’60s, wrote a famous article which he titled “the unreasonable effectiveness of mathematics in natural sciences”^{2}. In the paper, he argued that there is something mysterious about how mathematics is able to capture the rules of the universe.
Using physics as an example, Wigner argued that a lot of the mathematics that were used to describe physical phenomena were developed by mathematicans years/decades before they were actually used to do so. The mathematicans concieved the math simply because it was interesting, and not with the intent of using it to describe physical laws. Yet, Wigner argued, these tools turned out to be integral to physics. In addition, the accuracy with which these mathematical rules described phenomena was surprising. This is why he phrases the effectiveness of mathematics as unreasonable - it is a “miracle” that these tools/concepts invented by humans can describe the universe so well^{2}.
Wigner effectively argued that mathematics is language of the universe and we simply discovered it^{3}. Understandably, several scientists/intellectuals were in agreement, including Einstein:
How can it be that mathematics, being after all a product of human thought which is independent of experience, is so admirably appropriate to the objects of reality?
On the other hand, in the article “the reasonable ineffectiveness of mathematics”^{4} Derek Abbott cogently argues that mathematics is just a tool invented by humans and that there is nothing mysterious about its effectiveness nor is it actually very effective for real-world scenarios.
His key argument is that as engineers, we understand that these elegant mathematical equations only work for idealised scenarios and do not work at all scales. As an example, he talks about the transistor whose analytical equations, derived in the 1970s for micrometer scale transistors, do not hold for the current nanometer scale transistors we use today. With an increase in compute power, engineers have moved from analytical equations to the use of numerical methods as this captures non-linearities in systems much better. Using these as evidence, he argues that mathematics is incredibly effective only in describing simple, idealised systems, but not for real-world scenarios.
Derek Abbott’s article also illuminated my implicit bias towards mathematics as inventions. Engineers are taught to use interesting mathematics as convienent tools. For example, linear systems are ubiquitous in engineering simply because they is easy to work with and not because they describe everything around us well. Delta functions, step functions, etc, which we use for system identification, are in fact ideal functions that do no exist in the real world. As engineers, we are repeatedly told the extent of limitations of these mathematical tools we use and when they break down. It is no wonder that mathematics as an invention was my default view, as it was a corollary of the outlook I had towards the world.
In the end, searching for an alternative perspective didn’t change my stance, but it did increase my perception of the depth of this question and my understanding of the cause for my stance :nerd_face:
Footnotes:
This is described beautifully in https://www.ted.com/talks/jeff_dekofsky_is_math_discovered_or_invented ↩
A copy of the paper can be accessed from here - https://www.maths.ed.ac.uk/~v1ranick/papers/wigner.pdf ↩ ↩^{2}
This is part of the Platonian school of thought, which argues that mathematics has its own existence. A long description can be found here (I haven’t read it completely though). ↩
You can read it - https://ieeexplore.ieee.org/document/6600840 ↩
Most of the method is the same - observe a particular phenomenon, develop falsifiable hypotheses, perform experiments, and based on results, pick the unfalsified hypotheses and repeat the cycle with newer sub-hypotheses. This methodology follows the Baconian inference and the Popperian falsification of hypothesis, with a key difference. Platt argues that the reason why most fields do not progress rapidly despite applying Baconian inference is because scientists in these fields typically stop with developing one hypothesis. Because they only come up with one hypothesis, which becomes their pet-hypothesis, falsifying it becomes painful as they inevitably get attached to it. This leads to stagnation, both because the hypothesis which should be falsified is not, or due to ego-battles with other scientists who manage to falsify it.
Platt’s solution, borrowed from T. C. Chamberlin, is to always develop multiple hypotheses before testing with experiments. This not only avoids the pet-hypothesis problem, but also fosters collaboration as multiple groups can go about falsifying any of these hypotheses. The clear language, the lucid points made, and the attractiveness of the method quickly made this one of my favorite papers. I took this method as the optimum method to do science, no questions asked. Recently, two articles have made me rethink my stance on this.
The first one^{2}, evocatively titled “the abuses of Popper”, argues that true falsification of hypothesis is hard. In order to falsify a particular hypothesis, a specific experiment has to be designed that explicitly falsifies this. But in reality, it is rarely possible to come with such a precise experiment - there could always be other reasons why the result didn’t turn out as it should have as predicted by the hypothesis. As scientists, we try our best to control for the unpredictable aspects of the experiment, and then implicitly assume that the result is not due to problems with the experiment. That is, our good-faith assumption that the experiment works as we intended actually allows us to falsify our hypothesis.
It actually took me a while to truly get this - I guess that the blind belief in the scientific method was so ingrained that I didn’t see the implicit assumption used in interpreting our results. For example, if I measure the speed of light and show that the speed of light is different when the object is moving at different speeds, does this mean that I falsified the special theory of relativity? Or it was just experimental error?
I begin to see the reason for discord among scientists and why they still hold onto their pet-hypotheses - the same good-faith assumptions that they hold for their experiments turn into bad-faith assumptions for all other experiments. I can definitely see the advantage of the Popperian framework - it allows for consistent and methodical progress, but its Achilles heel is the implicit assumption for falsification. The “strong inference” method is much better as generating multiple hypothesis and performing an experiment that falsifies all but one of them weakens the necessity of this assumption.
The article also talks about a darker consequence of the falsification mindset - the lack of moral accountability for the science (as all I am doing is falsifying stuff) and the use of falsification to not accept / debunk climate change. I won’t go into this here, so make sure you read the article. It definitely began to crack my perception of the infallibility of the strong inference.
The second article^{3} amplifies these cracks further. It makes several salient points about the importance of constant conversation, debate, and evaluation of the methods used in science. Theoretical physics was one of the fields that wholeheartedly embraced the Popperian framework. Physicists applied this framework to generate a whole slew of hypotheses to explain phenomenon. Because each of these hypotheses were backed by these and not falsified experimentally, the field assumed that each of them were equally probable. And because experiments are costly both in terms of time and money, only a few hypotheses could be tested, causing a pool of untested hypotheses to always remain. The author argues that this ruthless application of the Popperian falsibility criteria has caused a stagnation of theoretical physics, a problem that would have been avoided if there was a continuous conversation about the philosophy of the scientific method currently being used.
I felt this article shed light on a different problem with strong inference - the cost of doing an experiment. Falsifying multiple hypotheses often implies multiple experiments, each of which takes time/money. Often, one has to ask how plausible each hypothesis is. is it worth doing an experiment to falsify it? Popperians and Kuhnians would argue that this is not a question a scientist can answer. The author argues that yes, one can never give a definitive answer to that question, but based on previous knowledge one can be reasonably certain. As with the speed of the light example, the probability of the experiment falsifying the special theory of relativity is low; not zero, but unlikely based on what is known.
In his paper, Platt implicitly acknowledges these problems. Multiple hypotheses and collaborations are proposed as solutions to the single hypothesis and good-faith assumption problem. The pitfalls of multiple hypotheses and the cost of an experiment is also hinted at, as in the below quote.
Problems of this complexity, if they can be solved at all,can be solved only by men generating and excluding possibilities with maximum effectiveness, to obtain a high degree of information per unit time-men willing to work a little bit at thinking.
This makes me like the paper even more. Yet, I feel the fog of certainty of “strong inference” slowly lifting. It is definitely a really good tool in our toolboxes, but it has significant problems one needs to be wary of. And this awareness might allow us to develop new ways of discovering knowledge.
Footnotes:
]]>Recently, I have come to the realization that I am a pretty bad with notes. I am not bad at taking notes, I have notes of interesting ideas from almost everything - talks, papers, books, you name it. The step I am bad at is how I use these notes. Most of these sit in a notebook somewhere, lying dormant - the probability of them being visited again is very low. When I do infrequently revisit them, I often find that ideas I thought I had “recently” come up with have been sitting in these notes for a lot of time. I have known about this inefficiency in my process for a while now, maybe the last year or two of my PhD, but I have been busy with “more important things”. A few months back, I figured I should now act on this realization and figure out a way of pooling my ideas and actually building on them, rather than writing them down as an impulse and forgetting about it. This is when I came across the book - How to Take Smart Notes by Sönke Ahrens^{1}.
So, this book talks about the Zettelkasten method (German for slip box), which is a non-linear way of taking notes. This method was used by Niklas Luhmann, a German sociologist who was crazy prolific as an academic (wiki says: 70 books, 400 scientific papers!!), which itself made me prick up my ears. Not sure exactly what his theory is about, but it is supposed to be so deep that he became one of the most important sociologists of the 20th century. He credits his ideas as well as his papers to the use of zettelkasten.
The key reason why I’m pretty bad with notes is a lack of organization and filtering - I don’t have a place to go to check on my ideas or look at all the information I gather. Yes, they are all in the notebooks, but they are interspersed with not-so-great ideas, random facts, so on. That is because I use the notes for two (incompatible) reasons: one, to engage myself with the talk or the paper^{2}; two, to note down points that are important to store as some form of memory^{2}. This implies that my notes have important facts and ideas I want to remember, and not so important stuff that needs to be filtered.
The method proposed in the book (and used by Niklas Luhmann) solves this conundrum by taking multiple types of notes. First, “fleeting” notes, which are used for both engagement and to write down important facts/ideas. Essentially, this is the step which most of us already do. The next step is to filter these fleeting notes into two types - “literature” notes that link facts/results/discussion points to a paper or a talk, and “permanent” notes that link ideas linked to the literature notes. You then throw away your fleeting notes and file both your “literature” and “permanent” notes into your long-term archive of ideas - your zettelkasten.
Niklas Luhmann used small slips of paper for his “literature” and “permanent” ideas. Each slip (or zettel in German) contains facts linked to a paper or ideas based on a paper or group of papers. These were then filed into the slip box (or zettelkasten) for long-term storage.
The goal, at least for most of us who take notes and aren’t blessed with good memory, is to develop a second brain to store information gathered from reading. The zettelkasten functions as the second brain. The first critical step of filtering ensures that the right information goes into this brain. The second step is organization - how do you ensure that you can pull out the right form of information when you need it?
Here is where the non-linearity comes in. Niklas Luhmann used to simply assign an ID to every note that goes into the zettelkasten. Now, he used two forms of linking. In the first, he placed relevant notes (i.e. ideas) physically close to one another other^{3}, and this allowed him to form a “chain of thought”. In the second method, he linked ideas in one note to those in other notes by tagging their ids^{3}. This is a very important way of linking the same idea to different contexts. Both methods allowed him to form a complicated web of ideas and develop them as he added more ideas.
I found this way of note-taking very elegant as it allows you to constantly peek into your idea-box and link ideas to one other. This process allows you to look at ideas in new contexts and generate more ideas, atleast in theory.
I can sort of see how this could have helped Niklas Luhmann connect different subjects and form unifying theories of sociology. You can also see the same interconnectedness in the book - the author managed to link multiple fields of research to make the point that the zettlekasten system is an effective way of taking notes.
In general, I found this book to be a good/fun read, although a bit repetitive at some points. Apart from the zettelkasten method, it talks about various things - the importance of organization and positive workflows that do not require motivation, the importance of writing, and the limits of memory^{4}.
That said, I must say that I haven’t completely implemented the zettelkasten method myself. It has been a bit hard getting into the discipline of filtering notes and adding them into a digital zettelkasten. I have been making progress and have a preliminary system working - I will post more on this once I get it working properly and see benefits.
The caveat aside, I feel that this book opens a door into a new way of thinking about note-taking and organization in general, something I will remember and slowly act on.
As always, let me know if you have any comments/feedback/suggestions!
Footnotes:
]]>A smart phone is a godsend, is it not? Apart from calling people, it provides many other conveniences. There is no need to remember neighbourhoods or get physical maps to chart out routes when you are visiting new places; Google maps can do that for you. Nor is there a need to remember phone numbers of your close ones, or their birthdays. It serves as a personal assistant and provides you precisely timed reminders for everything. For someone as absent-minded as me, this is definitely something I cannot live without - it helps me organize my chaotic life.
In addition, most of these features are more or less free and available from the cheapest Android phone to the costliest iPhone. Android by itself is free and technically open source (though really not so much). There are a plethora of free Android apps which support themselves by showing ads, which implies that we don’t have to buy these apps. Mobile data is now cheap as well, which implies that more people can stream and learn a whole lot of things on their smartphones using these apps. It does seem like a positive example where technology has benefited all.
At this point, it is nice to think from the perspective of the people who are giving us this free stuff. Why is Google proving us with Android, as well as so many free (and ad-free) apps ranging from news to easy payment? What’s in it for them? Is it just altruism? That can’t be as their profits from Android are just going up, something which wouldn’t happen if it was an altruistic project. Not just that, their ad revenue has been increasing more or less exponentially. Although it is not clear, a part of this would have to be due to the apps with ads which we routinely use.
So what? Google shows a few ads and increases profits, while we enjoy Android for free - it seems like a fair trade. I agree, it would be a fair trade if Google did just that - showed a few random ads and kept Android and the whole ecosystem free for all. Unfortunately, just showing random ads is not a good business model - a company cannot be profitable from just that. Here is the problem - ads are only profitable if people actually act on it. For instance, let’s say a soft-drink company produced this new crazy flavour and wants to promote it. It asked for 100 ads to be shown to people, in the hopes that at least a few of them would try it out. Let’s assume that these ads are actually good - beautiful and tempting. Due to random chance, these were shown to people who believe that drinking soft-drinks is not healthy (a topic for another time). The soft-drink company did not get any return for the ads, and would not go to Google to display their ads the next time. Showing random ads therefore would not work.
Logically, a better approach would have been to show the ads to people who like soft drinks. They are more likely to try out new flavours and would click on the ad to find out more, as well as buy one later on from a store. Makes sense right? Okay, but how does one figure out who likes soft drinks? The answer is as simple as it is ethically wrong - gather as much data as possible about what people search for, what they buy, and what sort of articles they read. Infer from those what they like and don’t like and target ads based on that. Now that is a better and much more profitable business model.
From this perspective, providing Android and all those apps for free totally makes sense. Why? Because providers like Google get to gather data as you use these apps. And as they gather data, they continually update what they know about you. They then use that to show ads in their own apps, making you click them. Everything you do on this platform generates revenue for them - from just using apps (which is providing them free data) to clicking on ads (which is direct revenue). When Google can gain so much from this, why not provide Android for free?
Ah, so what? How much data can they gather? And how much can they infer from it? Quite a bit actually. The amount of data that can be gathered is unprecedented. Think of your phone as a tracking device that continuously sends information about where you are, what you are doing, what you are reading, who you are talking to (…etc). That is enough to basically reconstruct your whole life. Using anonymized location data alone, it is easy to reconstruct how you are, what you do, who you visit, etc. Don’t believe it? Try reading this. Just using location, it was easy to reconstruct all the places that Trump has been and how long he spent at each of these places. And that is just location. Imagine what else can be inferred using the rest of the Android data logs.
Ideally, all this data would only be for targeted ads if the company is nice. A more sinister motive would be to categorize people and slowly show them content to make them believe something. Now, imagine if a government did this systematically, for instance, to monitor people and decide their fates (China, for example). Not pretty, huh? Data is the fuel here. And something we have to be carefully about doling out.
Before smartphones and the internet, how would you share information about yourself? Would you go yell out your deepest secrets to random people? You would judge them, make friends with them, and reveal such things only when you get close enough, right? Why should it be different with data then? Well, that’s partly because we have been not told all this - it lies in that abstruse privacy policy document that we thoughtlessly agree to (not that we have an option if we have to use our newly bought phone). The core problem is consent and awareness - such companies have not asked for consent, or have sometimes sneakily asked for it, but have not made the users aware of its implications. And it is becoming increasingly clear that this is a problem.
Phew, paranoid much? Yeah, a bit. I have been thinking about all this, and figured it would make for a good blog series. I figured I will start with increasing the awareness of why it is a problem (weirdly, a lot of people are not aware). In the next few blog posts, I will lay down how I have begun to slowly restrict this information flow and (hopefully) reclaim my privacy.
Still not convinced? Here are a few resources to look at:
Our first order of business was to run through all the tutorials in TensorFlow, starting with the beginner’s mnist tutorial. We realized that the TensorFlow documentation is slightly obscure, hence this blog (/Jupyter notebook). The beginner’s mnist is essentially a linear model and therefore simply implements a perceptron. Surprisingly, a simple perceptron gives a nice classification accuracy of 92%.
import tensorflow as tf
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
The first few lines import TensorFlow and other necessary libraries for reshaping and plotting images. The last line downloads and loads the mnist dataset (read_data_sets does this automatically). mnist is now an object with training, test and validation data nicely sorted. mnist.train.images, for example, contains all the training images. Each image is 28x28 and linearized into a vector of size 784 (28*28). There are 55000 such training images, making the size of the training set (55000, 784). To look at the images, one has to extract and reshape them, as shown below.
np.shape(mnist.train.images)
(55000, 784)
plt.imshow(np.reshape(mnist.train.images[0,:],[28,28]), cmap='Greys')
The labels for each training image is stored as a ‘one-hot vector’. This essentially means there are 9 columns of output for each image (each row in mnist.train.images),
mnist.train.labels[0,:]
array([ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.])
Let’s now train a perceptron for the mnist classification task. We can easily do this by writing the perceptron as a simple linear classifier. The input x, which is our image, has a dimension of [image_number, 784]. This is a 2D matrix with each image as a row with 784 columns (28x28). Because the number of images is variable and depends on the training batch size, we use a placeholder to create it. Inputs are mostly created using placeholders as one of the dimensions (number of images trained) is generally variable. The weight is essentially is [784, 10] matrix which transforms each image into a one-hot vector. The bias is an intercept for each output and is therefore a vector of size 10 (bias sets the classification threshold of each output). The classification output therefore will have a dimension of [image_number, 10]. Essentially, the set of equations can be imagined as a perceptron as shown below.
The next step is to convert the output into probabilities (very useful). One simple way of doing this is to softmax the output. The softmax function is a multinomial generalization of a logistic regression (generally used for categorical distributions). A simple logistic regression essentially converts an independent variable into the probability of obtaining a binary dependent variable which can take only two values - “0” or “1”. Softmax (a.k.a Multinomial Logistic_regression) takes in multiple independent variables and converts it into probabilities of a categorical distribution (i.e. it gives a probability of obtaining one of (n) input variables). This is convenient as it ensures that the sum of the output is always one (thereby valid probabilities).
Once the output is classified, we have to compare the output classification with the ground truth and change the weights depending on it. There are a couple of ways of doing it. One simple way is the mean squared distance (or the L2 distance). Another (complicated but better) loss function is cross-entropy. There are several advantages of using cross-entropy over mean squared distance (nicely demonstrated in this blog post). Minimizing cross-entropy is same as minimizing Kullback-Leibler divergence, which is essentially the distance (information gain) of the obtained probability distribution and the true probability distribution (in bits). If both of them are same, which is true for perfect classification, then KL divergence goes to zero. Minimizing KL or cross-entropy by backpropagating the error is therefore one way of training the perceptron.
# Input and weights
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
# Output (apply softmax)
y_o = tf.nn.softmax(tf.matmul(x, W) + b)
# Cross entropy (loss function)
y_ = tf.placeholder(tf.float32, [None, 10]) # The ground truth (one-hot vectors)
y = tf.matmul(x, W) + b
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
# Add train step
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
In the code above, instead of applying the softmax function to the output and then computing the cross-entropy, TensorFlow recommends applying softmax_cross_entropy_with_logits. This essentially ensures that multinomial logistic regression is applied properly on the output (carefully covering numerical instabilities) before finding the cross entropy (read more on this here). The final line puts everything together by defining the train step with a learning rate and the loss function.
The below set of codes trains and tests the perceptron, giving a test accuracy of approx 92%.
# Create a session and train
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
y_: mnist.test.labels}))
0.9195
It is more fun to visualize the trained weights which provide intuition of how the perceptron classifies the mnist data. The weights are shown as images below.
fig, ax = plt.subplots(nrows=2, ncols=5)
fig.set_size_inches(18.5, 10.5)
for i in range(10):
ax[int(i/5)][int(i%5)].imshow(np.reshape(W[:,i].eval(),[28,28]))
The weights (shown above) seems to encapsulate each number more or less accurately. Numbers 0, 1, 2, 3 are more or less apparent (red is positive weights and blue is negative). The other numbers are a bit harder to visualize from the weights. Numbers 4, 5, 8 and 9 are the least apparent (atleast to me). Does the apparency of the weights in the images above somehow predict the accuracy of the classifications? That is, does the perceptron perform badly for the numbers 4, 5, 8 and 9?
To answer this, I simply plotted the errors for each digit classification and plotted the histogram below. The least apparent weights to in fact have a lot more classification errors (except for 4). Maybe the errors are due to incomplete learning of the weights. Or maybe the the errors are because of multiple representation of the error prone digits causing a blurred learning of both. Either way, a deep network should do even better on the dataset thereby giving better classification. I will try out the deep mnist tutorial next and blog about it soon!
prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
classifications = sess.run(prediction, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
_, labels = np.nonzero(mnist.test.labels)
incorrect_classified = [labels[i] for i, p in enumerate(classifications) if not p]
np.shape(incorrect_classified)
(816,)
sns.distplot(incorrect_classified, bins=20, kde=False, rug=True)
Last summer, I attended the OIST Computational Neuroscience Course (OCNC) in Okinawa, where we were taught the basics of computational neuroscience with a very hands-on approach. It was both rigorous and fun at the same time. The location of the summer school was spectacular - we were located at the sea-side house of OIST, with a beautiful view of the ocean (header image). The beach was a two minute walk away and visiting it everyday made the summer school even more special.
One of the things I learned about was place cells. Place cells are typically pyramidal neurons in the hippocampus that fire when the mouse/rat is in a particular ‘place’ in the environment. Collectively, place fields are thought to represent spatial locations, thereby encoding a cognitive map of the world. For example, figure 1 shows different place fields firing at different locations of the track, thus forming a cognitive map of the track.
While discussing about place cells, we wondered if we could figure out the location the rat was in using only the activity of the place cells. This is probably what the rat does, and therefore we should also be able to do the same. Place cell firing rate generally increases as the rat comes close to the cell’s preferred location, and decreases as the rat moves away from it. The actual spikes, however, are not periodic and are generally noisy between trials. The firing of the neuron, and therefore the noisy spikes, could be due to an underlying Poisson process. Is it be possible for us to determine the location of the rat in the maze if place cells have Poisson firing? To do this, let’s first generate data of a (place cell) neuron with Poisson firing. This is the subject of this blog. The inference of location based on place cell firing will be the subject of a subsequent blog.
Let’s try to simulate data for the place cell. We start with three place cells which fire at different locations on a 1-D line. Their location preferences are normally distributed (Gaussian distribution). There is quite a bit of overlap of place cell preferences, i.e. two or more cells will fire at a particular location, albeit at different firing rates. A normally distributed location preference means that the cell will have a higher firing rate at the center of the distribution when compared to its edges. However, things become a lot more complicated as the actual spikes are due to a Poisson process. This might not quite be true, but for now, let us just assume that place cell firing is indeed Poissonian, as it makes the whole problem a lot more interesting.
Let’s begin by writing a function to generate a Gaussian location preference. Instead of just writing a function for one Gaussian, let’s write one for sum of Gaussians so that we can create bimodal distributions, if necessary.
%pylab inline --no-import-all
import numpy as np
# Gaussian function
def sum_of_gaussians(a, positions, c):
f = lambda x: sum(a * np.exp( -(x-b)**2 / (2 * c**2)) for b in positions)
return f
Our next step is to simulate the Poisson firing of a neuron. This is the tricky part. We have to assume that neurons fire maximally only once a millisecond (very reasonable assumption, given the refractory period of a neuron). We now want to find the probability of a neuron ‘not firing’ in a one millisecond bin, given a particular firing rate. This turns out to be simply the negative exponential of the rate of firing (substitute k=0 in the probability equation of a Poisson distribution). To mimic firing, we obtain a random value between 0 and 1 from a uniform distribution, and if this turns out to be more than our probability, we say our neuron is fired a spike (again, the key assumption is that there cannot be more than one spike in a millisecond).
The above method is a nice way to obtain Poissonian firing (I think. Are there better ways? Do let me know). To make the neuron fire variably, depending on the location, we simply need to pass a rate function that changes with time (indicating change in activity to due location change). The below function does all the above.
def poisson_firing(ratefn, time):
"""
ratefn describes the prescribed rate for each point in time t.
(Try to use a lambda function for the ratefn input)
Time can be in seconds or milliseconds
Assumption:
Assuming only one spike can happen in a millisecond (reasonable assumption)
Effective refractory period for continuous firing is 1 ms
(i.e max firing = 1000 Hz)
Good enough assumption as long as a rate function with an sane firing
rate is used
"""
dt = time[1]-time[0]
spiketimes = []
for t in time:
average_events = dt * ratefn(t)
P = np.exp(-1 * average_events) # Probability of no spikes
rand = np.random.random()
if np.random.random() > P: # Append spike if random number greater than probability
spiketimes.append(t)
return spiketimes
Let’s first create our three place neurons with broad and overlapping location preference.
# 1-D line
vel = 1 # 1 unit per sec
t = np.arange(0, 30, 0.001)
x = lambda t: vel * t
# Preference distribution of neurons (tuning curve) - max firing - 50 Hz
neuron1_preference = sum_of_gaussians(50, [10], 5)
neuron2_preference = sum_of_gaussians(50, [15], 5)
neuron3_preference = sum_of_gaussians(50, [20], 5)
# Plot preference
pylab.plot(x(t), neuron1_preference(x(t)))
pylab.plot(x(t), neuron2_preference(x(t)))
pylab.plot(x(t), neuron3_preference(x(t)))
Now, let’s simulate their firing rate if the animal is moving with a fixed velocity.
# Obtain poisson based firing rates of each (place cell) neuron
trials = 10
neuron1_firing = []
neuron2_firing = []
neuron3_firing = []
for _ in range(trials):
neuron1_firing.append(poisson_firing(lambda t: neuron1_preference(vel * t), t))
neuron2_firing.append(poisson_firing(lambda t: neuron2_preference(vel * t), t))
neuron3_firing.append(poisson_firing(lambda t: neuron3_preference(vel * t), t))
# Raster plot each of the three neurons
for i in range(trials):
pylab.plot(neuron1_firing[i], (5 * i + 1) * np.ones(len(neuron1_firing[i])), '|b')
pylab.plot(neuron2_firing[i], (5 * i + 2) * np.ones(len(neuron2_firing[i])), '|g')
pylab.plot(neuron3_firing[i], (5 * i + 3) * np.ones(len(neuron3_firing[i])), '|r')
# beautify
pylab.axes().set_xticks(np.arange(0,30+1,5))
pylab.axes().set_yticks(np.arange(0,trials*2,3))
We can see that our neuron firing rate is different between trials, but they still respond to location with a normally distributed increase in firing rate. We have successfully simulated place cells with Gaussian location preference and Poissonian firing (a blog on decoding this will be written soon).
You can the download the jupyter notebook for the above code directly from here.
[P.S: The idea was stimulated by discussions with Indrajith Nair, who attended OCNC along with me]
[Updated on 2017-05-01: Add comments to the poisson_firing function and added some text to explain it better]
]]>