Introduction to what I am doing now

One of two projects that I am working on is calculating achievable information bound of a visual target motion using information bottleneck method (Information bottleneck method wikipedia). The parameter of target (let’s say a bar object) I am interested in is a position of the bar. If the bar is located in the exactly middle of the monitor, its position at current time (t = 0) would be zero. If the bar in the next time moment (t = 1) moved to one step to the right, its position will be one and even can jump to the position two or three (or even -1!). You get the idea.

The question is, how well will you be able to predict the next position of the bar? How much past bar’s dynamic do you need to observe to make an accurate prediction of bar’s position in the next one or two future? You might think it totally depends on how fast the bar is moving or how often it changes its moving direction. You are right! Current study from our group found that neurons’ predictability in Salamander and mouse retina depends on stimulus’ overall dynamic rules that govern its position or velocity [1].

One more question (not so fast..). Do you really need to see all past target position or dynamic to predict the future position of the bar? If all we observed was a bar slowly moving to the right, we will be able to say “Hey! it’s in the position A and according to the only one past time frame, it will be A+1 in the next time frame (unless I am extremely unlucky and something happens that embarrasses me) !”. Well, yeah, you don’t need to see the all. Not all past target dynamic is informative about future target position. The key in here is, with given past target dynamics, what is the most efficient (meaning least amount) representation of past target dynamics that carry the most information about the future state of target dynamic?

Information bottleneck method provides a nice framework in this regard and there is an analytical solution to this problem if stimulus statistics is Gaussian distribution [2], although the original Tishby’s work [3] provides the general iterative solution. By saying solution, I mean you can find a compressed version of what you have seen so far, and that compressed representation of target dynamic is adequate to carry the information about the future target dynamic. The detail of this procedure can be found from the reference [2] (analytical solution in the Gaussian variable) and [3] (general solution).

Ok. It’s all here. All I need to do is implementing this algorithm using whatever programming language that I am capable of. The stimulus or target dynamic that I use for my research project is not necessarily Gaussian variable, which means I will have to write up the iterative………d that is hard. Why hard? I will go from here in the next post.


[1] Salisbury, Jared, and Stephanie E. Palmer. “Optimal prediction and natural scene statistics in the retina.” arXiv preprint arXiv:1507.00125 (2015).

[2] Chechik, Gal, et al. “Information bottleneck for Gaussian variables.” Journal of Machine Learning Research. 2005.

[3] Tishby, Naftali, Fernando C. Pereira, and William Bialek. “The information bottleneck method.” arXiv preprint physics/0004057 (2000).


Writing vs. Thought


(Image from a korean movie poster “생각보다 맑은”;

My favorite time of the day is when I take a shower in the morning. It is only time of the day that my mind takes a break like a peaceful pond in the forest. I organize my thoughts and plan what to do and what should be done today. No verbal interaction is needed and everything stays inside of my mind. It gives me a legitimate fulfillment to assume that things will go exactly what I plan to do.

At this moment, bright idea often comes in my mind that resolves my problems. It is a great pleasure to encounter THAT moment because I always seek a different view to attack issues that I have. Then I think “I probably should write that down.”, but I never do. It’s not because there is no pen and paper in the shower booth, just because the idea goes along. It branches like a huge tree in front of my apartment building and bears another thought and another. It keeps going and going until I finally realize that I am almost late for work.

Thankfully, I don’t usually forget it and hold onto it until I arrive at my office and apply that thoughts. No matter it is a solution to what I have been struggling with, I adore that moment; looking at a problem from a different direction. For this reason, I often find the time I spend to materialize what I think is waste of time as thoughts keep evolving.

A problem arises when I have to present what I have been doing to the audience. Great speaker and writer guides his/her audience to the sympathetic moment (not a pitiful way, but the common feeling) so that they support and share the feeling with a speaker. Magic comes along with the right words and way to describe a development of thoughts. But my thoughts are tangled in complex way so that I spend significant amount of time to find a tip of “thoughts-yarn” when I try to present it.

Yesterday, I read that one of the most important thing to do in academia is writing. No one survives in academia without writing which chronologically proves one’s ability of developing thoughts and applying to improve the science. It urges me to write piece by piece (memo) constantly and acquire the skill of describing the abstraction. To me, it is a running race between writing and thoughts and their resource is limited time of the day. I definitely prefer thinking with a cup of coffee, putting everything inside of my head. But I do know that I have to balance them as I always see that successful scientists are excellent for both skills.


Cosyne 2016 poster session


SangWook Alex Lee, Stephanie E. Palmer, Leslie C. Osborne. “Optimal prediction in natural gaze behavior” Poster presentation, Cosyne 2016

Delays in sensory processing give rise to a lag between a stimulus and the organism’s reaction. This presents a particular challenge to tracking behaviors like smooth pursuit where a difference in eye and target motion creates image motion blur on the retina. One strategy that might compensate for processing delays is to extrapolate the future target position and make anticipatory eye movements. We recorded eye position in humans and monkeys engaged in tracking tasks to test for predictive information in eye movements. For humans, we created a video game task based on Atari Pong in which the subject moves a paddle to keep a ball bounding within an arena. We controlled the level of predictability, keeping collisions elastic or adding stochasticity. Subjects also watched “movies” of the Pong game being played, or of a ball bouncing within an enclosed arena (no paddle). We employed a third task for monkeys based on the classic step-ramp pursuit paradigm in which the target jumped and translated in a pseudorandom fashion. On this double step-ramp task, target motion was predictable over a much shorter time scale than in the Pong task (~500ms vs >2s). We computed the mutual information of eye and target position as a function of time shift. We find substantial predictive information in all tasks in both humans and monkeys, but there is a striking difference between active game play and the other tracking-based tasks. In active Pong, the mutual information in the eye position is very near the bound determined by the predictive information in the target itself, I(T(t),T(t + δt), thus all that is predictable about the target is incorporated into behavior. Eye-target information peaks at zero delay, such that prediction compensates for processing delays. In contrast, watching a movie of the Pong game or tracking the double step-ramp target produces gaze that is more reactive than predictive. We propose that the brain predicts the velocity for the eye’s path to cross the target’s path and uses “eye crossing velocity” to plan future saccades and pursuit eye movements.


Maximally informative dimensions – where are you?

There was the time that I was trying to search so called “maximally informative dimensions (MID)” MATLAB code all over the place (of course, it means that I was googling “maximally informative dimensions matlab code” kind of things). Probably you are also one of people who googled it like I did few years ago and ended up reading this jibber jabbers. I know that there is a super nice and neat paper about MID, which we all know and started our long search from. If you have spent more time on searching the code, you may have realized that the first author of this paper (Dr. Tatyana Sharpee) has publicly (and kindly) made this code available through github. And of course, we all get disappointed again that this code is written in C and need a special kind of data to run (not to mention the “curse of library”). Personally, I tried to read this code. All I had was time and patient and desire to analyze my temporally correlated neural data. But the complexity and unfamiliarity of C code bit my desire so hard in spite of beautiful derivation of Shannon information equation from the paper.

Let me tell you the consequence first. The equation is unbelievably easy and you can not find the clean code script just because of the optimization procedure described in the paper. I can (almost) confidently say that finding MID MATLAB code is as hard as finding simulated annealing MATLAB code that can be directly applied to your data. This is a direct reason and defining correct time bin size for your data (it should be predefined for your data due to the entropy issue, read this paper) is a secondary problem. This is not the worst problem actually. The original method to find the maximally informative dimensions also incorporate the linear searching (gradient ascending) to speed up the process. Even though you had a luck to acquire a nice stochastic optimization code that you can apply on your data, implementing linear searching method on top of that is not a easy question. Of course you can ignore this step hoping that your *super* computer to find a (global) solution. But I do know the solution you found after long long long desperate time of yours can be improved by changing cooling rate and terminal temperature, which takes more desperate time of yours. For this reason, number of people (at least I know like 5~6 people) stopped searching it and ended up implementing equivalent algorithm.

I am writing this not to discourage you about MID. I am writing this because I feel your frustration. It is a hauntingly beautiful and simple algorithm that can (possibly) be better than any other algorithm exist. I just wanted to let you know the short-cut in the searching process. Instead of roaming around the dark side of the internet (yeah, deep deep net), you probably want to search how to implement simulated annealing first. And you are not the only one. Don’t get depressed.

Note 1 : For those of who wants to apply the equivalent algorithm that I mentioned above, here is a paper that contains the proof of it.

Note 2: Recently, Dr. Sharpee presented simpler version of MID, called the minimal model.

Finally done final is.

Theoretical Neuroscience 


I had a great quarter with Dr. Stephanie Palmer‘s class “Theoretical Neuroscience: Network Dynamics and Computation”. It was basically the same material as what I learned from Dr. Adrienne Fairhall when I was in Seattle, but I really enjoyed the new data sets from various experimental lab. The textbook that we used for the class is Dayan and Abbott’s famous book “Theoretical Neuroscience” and covered chapter 1~4. V1 and retina data were used for neural encoding and M1 data (from Dr. Hatsopoulos) was given for decoding chapter and Fisher information. Rest of materials from the book, such as mechanistic models, were dealt in the previous class of this sequence by Dr. Nicholas Brunel. It seems that everyone in the class was very familiar with the material and professor was delightfully surprised. It’s maybe because they all have had used the elementary techniques that are introduced in the book for their own research already. It was certainly nice to know that many labs are implementing theoretical analysis for their data nowadays. Blossoming of theoretical/computational neuroscience field around 1980~90 centering around NY area is influencing next generation of this field greatly. I strongly hope this wave of improvement keep being spread out by great mind.

This book introduces theoretical tools from applied mathematics and statistics to explore how neural system encodes and processes the information. Several different approaches of modeling were used to face from the most basic structure of the system (cellular/molecular level) to the most possibly integrated system (behavioral level). As Dmitri Chklovskii quoted, it really does define the field.

I always wanted to share with public what I have learned through my own space. This place maybe a legitimate place to introduce the field of theoretical/computational neuroscience with some visualization using MATLAB. I may use some example problems from other textbooks as well to explore more in detail about each topics. I hope more people can have an easy access to the material and encouraged to tackle their research using fundamental approach of science.

Quarterly update – Fall 2014

First quarter of PhD program in Neurobiology at University of Chicago is finished. It was intense and exciting first step. There have been major changes in my thoughts from different aspects of neuroscience. Since I am in between quarters, I am here to report what I have done and learned through past 2.5 months.

  1. Cellular/molecular mechanisms of neuroscience are the starting point and should be always be the center of theory.

As my background tells, I have not taken any biology classes through my undergraduate educations. When I was first introduced the computational neuroscience as a subject of applied mathematics, I was so focusing on the function of the individual unit. Due to Hodgkin-Huxeley’s fascinating work on the giant squid axon, I conceived (so much) any compartment of the system in neuroscience as a micro-machinery that functions as it is designed by the evolution. As I read the literatures of so many different kinds of neurons from different part of the brain, I started to realize that deterministic description of the system is not sufficient to alternate the real unit of the system. It was the point of my study that I lost the interest about the classic dynamical system point of view in the neuroscience. As we write down the mathematical description to model the system or single unit, we lock the system, not allowing the variability of it. However, unlike the computer or modern machine that we have created, even single unit of neuron displays the great variability in its response. Even with the allowance of the noise, or the stochastic description of the system, it is the system itself that changes its response depending on the context, not only the input signal. This has been issue for many theoretical neuroscientists and it has been described as a reliability of the system.

How can we conquer the reliability of the neurons so that we can describe the true response and the plasticity of the neuron? As the single unit of the neuron is the physical existence and the response is the combination of cellular and molecular reaction of the even smaller units, there must be some range of the limit that describes the variability. The difficulties arise from the fact that there are so many different kinds of molecules and mechanisms involved even in the single action potential generation. It is almost impossible for us to use all reactions of the different ions, receptors, channels, and protein kinases (or so I think yet). The plasticity of the neurons happen in molecular levels i.e.) the intracellular or extracellular concentration of Ca2+ ions affect to the generation of LTD or LTP. And this plasticity affects to the population of the neurons and at the end, whole function of the brain. Thus, we should not lock the system up by writing up the set of equations that ultimately leads us the false description of the brain function.

In this point of view, the statistical description of the microsystem with some distribution is going to play a huge role. And we need to impose the stochasticity to not only to the input, but also to the system. In the previous study of ‘context dependent coding in single neurons’, we used the generalized linear model to describe the motoneuron’s coding which displays the medium duration of after-hyperpolarization effect. This internal feedback depends on the mean level of the input stimulus that evokes the calcium ions dynamics. As a class of simplified model, GLM alters the firing rate (or probability of the firing in the specific time window) depending on the projection of the input signal onto the stimulus filter + the projection of the recent history of the neural activity onto the spike history filter. This description was quite successful to predict the exact timing of the spike with the given input stimulus, with some restrictions of the input statistics.

However, as we discussed above, the system itself should alter its state depends on the recent activity of its own, not changing the probability of the response. In this sense, altering parameter values of the linear system is desirable instead of static linear filter that depends only on the temporal pattern of the input; that is, temporal pattern of the input and response should decide the state of the system and thus decide the probability of the firing.

These thoughts introduced me the notion of higher dimension-linear filter that takes the temporal pattern of the input and response as a domain and results the probability of the spike as an output i.e.) the functional k : Rn x Rm -> R

  1. Target motion predictability determines the predictability of gaze decisions from retina inputs.

In order to stabilize a moving target’s retinal image, the brain must make continuous visual estimates of target motion and evaluate the trade-off between smoothly modulating eye movement and issuing a saccade. Pursuit offers the advantage of uninterrupted visual information but is not able to compensate for large retinal errors; saccades, on the other hand, are able to reduce large errors quickly, but they are likely to degrade visual information during the process. If target motion is unpredictable, gaze behavior must be driven by delayed visual estimates. But if the target trajectory can be extrapolated into the future because its motion is predictable, then pursuit and saccades may be coordinated to maximize both visual information and tracking performance.

We investigate an existing formulation of the decision rule between pursuit and saccade introduced by Lefevre and colleagues (2002). This quantity, Eye-Crossing Time (TxE), is defined as the time it would take the eye trajectory to cross the target. We tracked eye movements in human subjects with a Dual Purkinje Image eye tracker (Fourward Technologies) and in monkeys with scleral coils. We used three experimental paradigms: a 1D and 2D double step-ramp experiment, and a single-player version of the video game, Pong. In the double step-ramp paradigm, randomized trial presentation and a large parameter space minimized predictability. In Pong, the target dynamics – a small spot target with constant speed and elastic collisions with the arena walls – were predictable.

We extend the existing definition of Eye-Crossing Time (TxE) to two-dimensions and show that there is a decision rule that captures gaze behavior across both experimental paradigms and both species. When conditioned on a saccade 125 ms in the future, TxE distributes equivalently for both the double step-ramp experiment and Pong, and is consistent between humans and non-human primates. Saccades are most likely when TxE is less than zero; pursuit is most likely when TxE is between 0-200 ms. That means that the occurrence of a saccade tells us something about the Eye- Crossing Time in the recent past.

But is the converse true? Can we predict gaze behavior from TxE? We find that the likelihood of observing a saccade given the occurrence of an appropriate TxE is peaked at ~130 ms, as expected. But this only held true in the double step-ramp experiment, not in Pong. We find that a TxE based decision rule holds when gaze behavior is driven by feed- forward visual estimates. When motion becomes predictable, gaze behavior is no longer captured by the same decision rule. We apply information theoretic analysis to quantify the interaction between target, gaze, and time.