Monday, August 27, 2018

Linear Equations

Introduction

Solving a system of linear equations is non-trivial. The problem can be formulated as:

$A \vec x = \vec b$

where $A$ is given as a matrix of coefficients, $\vec x$ represents the variables to be solved and $\vec b$ is given as the values to be used in the solution. Rearranging the terms:

$\vec x = A^{-1} \vec b$

The problem now becomes one of finding the inverse of the $A$ matrix. It can be shown that the complexity of this operation is $O(n^3)$. If $A$ is sparse, the complexity can be shown to be proportional to $O(N log N)$.

Seth, Lloyd and his colleagues at MIT had presented a Quantum algorithm with a complexity proportional to $O(logN)$. The complete presentation can be viewed at https://arxiv.org/pdf/1302.1210.pdf.

My attempt is to unravel the important aspects of the algorithm.

The algorithm

Let

$|\psi_0> = {\sqrt {2 \over T}} {\sum_{\tau = 0}^{T-1}} sin { {{\pi} (\tau + {1 \over 2})} \over T}|\tau>$

where T is an integer > 0

Apply ${\sum_{\tau = 0}^{T-1}}|\tau>|<\tau|^C ⊗{e^{iA\tau t_o} \over T} on |\psi_0>^C⊗|b_i>$

The above operation will generate eigen values and eigen vector using quantum phase transformation described at https://en.wikipedia.org/wiki/Quantum_phase_estimation_algorithm.

Fourier transforming the first register (there are 2 registers in the quantum phase transformation article above) gives the state

${\sum_{j=1}^N}{\sum_{k=0}^{T-1}} \alpha_{kj} \beta_j |k>|{u_j>}$

where |k> are the Fourier basis states.

Define:

${\vec \lambda_k} = {{2 \pi k} \over {t_0}}$

${\sum_{j=1}^N}{\sum_{k=0}^{T-1}} {\alpha_{kj}} {\beta_j} |{\vec \lambda_k}>|{u_j}>$

Adding an ancillary qubit and rotating conditioned on $|\vec \lambda_k>$ gives:

${\sum_{j=1}^N}{\sum_{k=0}^{T-1}} {\alpha_{kj}} {\beta_j} {|{\vec \lambda_k}>} {|u_j>} ( {\sqrt {1 - {C^2 \over {\vec \lambda_k^2}}}} |0> + {C \over {\vec \lambda_k}} |1>)$

If the quantum phase estimation is correct:

$\alpha_{kj} = 1$ if ${\vec \lambda_k|} = {\lambda_j}$ and $\alpha_{kj}=0$ otherwise

${\sum_{j=1}^N} {\beta_j}|u_j>( {\sqrt {1 - {C^2 \over {\vec \lambda_j^2}}}} |0> + {C \over {\vec \lambda_j}} |1>)$

The probability of the |1> qubit corresponds to the solution |x>

${\sqrt { 1 \over {{\sum_{j=1}^N}{C^2|{\beta_j|^2}/{|\lambda_j|^2}}}}}{\sum_{j=1}^N} {\beta_j}{C \over {\lambda_j}} |u_j>$

Since lambdas are eigen values, the inversion is assumed as solution which is the case when the matrix is diagnoal and sparse

For a discussion of this work and some circuits see https://quantumcomputing.stackexchange.com/questions/2439/quantum-algorithm-for-linear-systems-of-equations-hhl09-step-2-preparation

Friday, August 24, 2018

Grover's Algorithm

Introduction

Quantum computing has been a fascinating journey for me. Among all the algorithms I have studied Grover's algorithm (see his on-line article at https://arxiv.org/pdf/quant-ph/9605043.pdf) is the most fascinating one because it motivates quantum computer enthusiasts to use search in an unordered list, such as database entries, as a way to approach many of the complex computations involved in such problems as traveling salesman, decryption of cryptographic text, and password guessing.

At the core of quantum computing there is what is called a Qubit. A qubit is a super-position of a fundamental particle's, such as an electron's, spin states. For example: an electron's spin states are described as either up or down. In classical computers, the analogy is binary 0/1 number. Unlike classical bits, a qubit can be both 0 and 1 at the same time. This is given by the following representation:

$\psi = \alpha |1> + \beta |0>$

where $\alpha$ and $\beta$ are the amplitudes. The probabilities are given as:

$P_1 = {|\alpha|^2 \over {|\alpha|^2 + |\beta|^2}}$ and $P_0 = {|\beta|^2 \over {|\alpha|^2 + |\beta|^2}}$

This state persists, until one makes a measurement (see below), which upon measurement collapses to either 0 or 1. The super-position and what is known as the phase of the spin, gives several degrees of freedom that enable qubits to simultaneously represent exponential states.

At the moment most people simulate quantum computing on classical computers. I recommend reading https://www.codeproject.com/Articles/1131573/Grovers-Search-Algorithm-explained for a glimpse of java code and a worked out example. However, there seems to be evidence to suggest that at this time about 50 qubit quantum computing has been achieved in the laboratories across the world.

Condensed from https://people.cs.umass.edu/~strubell/doc/quantum_tutorial.pdf: Along with the superposition of states, Grover’s algorithm, and more generally the family of quantum algorithms that use what is known as amplitude amplification, exploit the qualities of quantum amplitudes that differentiate those amplitudes from simple probabilities. The key to these algorithms is the selective shifting of the phase of one state of a quantum system, one that satisfies some condition, at each iteration. Performing a phase shift of $\pi$ is equivalent to multiplying the amplitude of that state by −1: the amplitude for that state changes, but the probability--which is defined as the norm-square of amplitude-- of being in that state remains the same (since the probability disregards the sign of the amplitude). However, subsequent transformations performed on the system take advantage of that difference in amplitude to single out that state of differing phase and to ultimately increase the probability of the system being in that state. Such sequences of operations would not be possible if the amplitudes did not hold that extra information regarding the phase of the state in addition to the probability. These amplitude amplification algorithms are unique to quantum computing because of this quality of amplitudes that has no analog in classical probabilities.

Knowing quantum mechanics a little bit, I was curious to know how the amplitude of an elementary particle, such as a an electron, in a superposition state could be measured. Apparently there is such an instrument called Stern-Gerlach. https://en.wikipedia.org/wiki/Stern%E2%80%93Gerlach_experiment

Read on to appreciate the prospect of quantum computing that will make all of our complex tasks trifle easy one day.

Grover’s Algorithm

Quantum search

Grover’s algorithm performs a search over an unordered set of $N = 2^n$ items to find the unique element that satisfies some condition. While the best classical algorithm for a search over unordered data requires $O(N)$ time, Grover’s algorithm performs the search on a quantum computer in only $O(\sqrt N)$ operations, a quadratic speedup.

In order to achieve such a speedup, Grover relies on the quantum superposition of states. Like many quantum algorithms, Grover’s begins by putting the machine into an equal superposition of all possible $2^n$ states of the n-qubit register. Remember that means there is an equal amplitude of $1 \over {\sqrt {2^n}}$ associated with every possible configuration of qubits in the system, and an equal probability of $1 \over 2^n$ that the system will be in any of the $2^n$ states. All of these possible states correspond to all the possible entries in Grover’s database (he originally wanted his algorithm to lookup names in a database of phone numbers), and so starting with equal amplitudes assigned to each element in the search space, every element can be considered at once in a quantum superposition, and amplitudes can be manipulated from there to produce the correct entry in the database with a probability of “at least” 1/2.

Also note that in the following sections an Oracle is used to match the state with the expected value to determine if a solution has been found. The Oracle cannot affect the amplitude but can affect only the phase. So even though the Oracle "knows" the expected value, it cannot increase the amplitude to increase the probability.

Grover’s algorithm: How it works

Grover’s algorithm begins with a quantum register of n qubits, where n is the number of qubits necessary to represent the search space of size $2^n = N$, all initialized to |0>

$|0>^{⊗n} = |0>$

Note: ⊗ is the notation for vector cross-product

The first step is to put the system into an equal superposition of states, achieved by applying the Hadamard transform (https://en.wikipedia.org/wiki/Hadamard_transform) $H^{⊗n}$, which requires $O(log N) = O(log 2^n) = O(n)$ operations, n applications of the elementary Hadamard gate:

$|\psi> = H^{⊗n}|0>^{⊗n} ={1 \over {\sqrt {2^n}}} \sum_{x=0}^{2^{n}-1}|x>$

The next series of transformations is often referred to as the Grover iteration, and performs the amplitude amplification mentioned earlier, the bulk of the algorithm. The Grover iteration will be repeated $\pi \over {4 {\sqrt {2^n}}}$ times. According to Grover, in order to achieve optimal probability that the state that we ultimately observe is the correct one, we want the overall rotation of the phase to be $\pi \over 4$ radians, which will occur on average after $\pi \over {4 \sqrt {2^n}}$ iterations.

The Oracle ($\omega$)

The first step in the Grover iteration is a call to a quantum oracle, $\omega$, that will modify the system depending on whether it is in the configuration we are searching for. An oracle is basically a black-box function, and this quantum oracle is a quantum black-box, meaning it can observe and modify the system without collapsing it to a classical state, that will recognize if the system is in the correct state. If the system is indeed in the correct state, then the oracle will rotate the phase by $\pi$ radians, otherwise it will do nothing, effectively marking the correct state for further modification by subsequent operations. Remember that such a phase shift leaves the probability of the system being correct state the same, although the amplitude is negated. Quantum oracle implementations will often use an extra scratch qubit, but in this implementation the extra qubit is unnecessary, so the oracle’s effect on $|x>$ may be written simply:

$|x> \rightarrow (−1)^{f(x)}|x>$

Where f(x) = 1 if x is the correct state, and f(x) = 0 otherwise. The exact implementation of f(x) is dependent on the particular search problem.

Grover refers to the next part of the iteration as the diffusion transform, which performs inversion about the average, transforming the amplitude of each state so that it is as far above the average as it was below the average prior to the transformation, and vice versa. This diffusion transform consists of another application of the Hadamard transform $H^{⊗n}$, followed by a conditional phase shift that shifts every state except $|0>$ by −1, followed by yet another Hadamard transform. The conditional phase shift can be represented by the unitary operator $2|0><0| − I$:

$[2|0><0| − I] |0> = 2 |0> <0|0> − I = |0>[2 |0> <0| − I] |x> = 2 |0> <0|x> − I = − |x>$

Giving the entire diffusion transform, using the notation $\psi$:

$H^{⊗n}[2 |0> <0| − I] H^{⊗n} = 2 H^{⊗n}|0> <0| H^{⊗n} − I = 2 |\psi> <\psi| − I$

And the entire Grover iteration:

$[2 |\psi><\psi| − I] \omega $

In considering the runtime of the Grover iteration, the exact runtime of the oracle depends on the specific problem and implementation, so a call to $\omega$ is viewed as one elementary operation. The total runtime, then, of a single Grover iteration is $O (2^n)$, from the two Hadamard transforms, plus the cost of applying $O(n)$ gates to perform the conditional phase shift, is $O(n)$. It follows that the runtime of Grover’s entire algorithm, performing $O(\sqrt N) = O(\sqrt {2^n}) = O(2^{n \over 2})$ iterations each with a runtime of $O(n)$, is $O(2^{n \over 2})$.

Once the Grover iteration has been performed an adequate number of times, a classical measurement is performed to determine the result, which will be correct with probability $O(1)$ completing the execution of the algorithm.

Grover’s algorithm is summarized as follows:

Input:

  • A quantum oracle $\omega$ which performs the operation $\omega|x>= (−1)^{f(x)}|x>$, where f(x) = 0 for all 0 ≤ x < 2n except $x_0$, for which $f(x_0)$ = 1.
  • n qubits initialized to the state |0>

Output: $x_0$

Runtime: $O(\sqrt 2^n)$ operations, with O(1) probability of success.

Procedure:

  1. $|0>^{⊗n} \rightarrow$ initial state
  2. $H^{⊗n}|0>^{⊗n} = {1 \over {\sqrt 2^n}} \sum_{x=0}^{2^n - 1} |x> = |\psi> \rightarrow $apply the Hadamard transform to all qubits
  3. $[(2 |\psi><\psi| − I)O]^R |\psi> \approx |x0> \rightarrow$ apply the Grover iteration $R ≈ {\pi \over 4 \sqrt {2^n}}$ times
  4. $x_0 \rightarrow$ measure the register

Grover’s algorithm: Worked example

Consider a system consisting of N = 8 = $2^3$ states, and the state we are searching for, $x_0$, is represented by the bit string 011. To describe this system, n = 3 qubits are required, represented as:

$|x> = \alpha_0 |000> + \alpha_1 |001> + \alpha_2 |010> + \alpha_3 |011> + \alpha_4 |100> + \alpha_5 |101> + \alpha_6 |110> + \alpha_7 |111>$

where $\alpha_i$ is the amplitude of the state |i>. Grover’s algorithm begins with a system initialized to 0:

1 |000>

and then apply the Hadamard transformation to obtain equal amplitudes associated with each state of $ {1 \over {\sqrt N}} = {1 \over \sqrt 8} = {1 \over {2 \sqrt 2}}$, and thus also equal probability of being in any of the 8 possible states:

${H^3|000>} = {1 \over 2 \sqrt 2} |000> + {1 \over 2 \sqrt 2} |001> + . . . + {1 \over 2 \sqrt 2} |111> = {1 \over 2 \sqrt 2} \sum_{x=0}^7 |x> = |\psi>$

The first step is to call the quantum oracle $\omega$, then perform inversion about the average, or the diffusion transform. The oracle query will negate the amplitude of the state $|x_0>$, in this case |011>, giving the configuration:

$|x> = \alpha_0 |000> + \alpha_1 |001> + \alpha_2 |010> - \alpha_3 |011> + \alpha_4 |100> + \alpha_5 |101> + \alpha_6 |110> + \alpha_7 |111>$

Now, perform the diffusion transform $2 |\psi> <\psi| − I$, which will increase the amplitudes by their difference from the average, decreasing if the difference is negative:

${[2 |\psi><\psi| − I] |x>}= {[2 |\psi> <\psi| − I] [|\psi>} − {2 \over {2 \sqrt 2}}|011> = 2 |\psi><\psi|\psi> − |\psi> − {2 \over \sqrt 2} |\psi> <\psi|011> + {1 \over \sqrt 2} |011>$ Note that $<\psi|\psi> = 8 {1 \over 2 \sqrt 2}{1 \over 2 \sqrt 2}$ Additionally, since |011> is one of the basis vectors, we can use the identity $<\psi|011> = <011|\psi> = {1 \over {2 \sqrt 2}}$ :

$= {2 |\psi> − |\psi>} − {2 \over {\sqrt 2}}({1 \over {2 \sqrt 2}}) |\psi> + {1 \over \sqrt 2} |011> = |\psi> − {1 \over 2} |\psi> + {1 \over {\sqrt 2} }|011> = {1 \over 2} |\psi> + {1 \over {\sqrt 2}} |011>$

Substituting gives:

$= {1 \over 2} [{1 \over {2 \sqrt 2}}\sum_{x=0}^7 |x>] + {1 \over {\sqrt 2} }|011> ={1 \over {4 \sqrt 2}} \sum_{x=0, x \neq 3}^7|x> + {1 \over {4 \sqrt 2} }|011> + {1 \over {\sqrt 2}} |011> = {1 \over {4 \sqrt 2} }\sum_{x=0, x \neq 3}^7|x> + {5 \over {4 \sqrt 2} }|011>$

In the notation used earlier: ${|x>} = {1 \over {4 \sqrt 2}} |000> + {1 \over {4 \sqrt 2}} |001> + {1 \over {4 \sqrt 2}} |010> + {5 \over {4 \sqrt 2}} |011> + . . . + {1 \over {4 \sqrt 2}} |111>$

Now when the system is observed, the probability that the state representative of the correct solution, |011>, will be measured is $|{5 \over {4 \sqrt 2}}|^2$.

Although Grover’s algorithm is probabilistic, the error truly becomes negligible as N grows large.

Friday, June 8, 2018

Free Will

One dictionary definition of free will is the power of acting without the constraint of necessity or fate; the ability to act at one's own discretion. This sets the stage for two concepts: there is an external agency such as fate and there is self-determination which enables us to act according to the way we choose. The first concept is called hard determinism and the second concept non-determinism or indeterminism. Indeterminism can be further elaborated as we are bodies made of atoms interacting with one another in a seemingly random fashion subject to probabilities as particle physics suggests (even though Newtonian physics' determinism applies to how objects move down an inclined planes for instance). Compatibilism is the combination of the two: there are a finite number of options and we are free to pick one by thinking through – thought experimenting-- the consequences without ever changing them. To understand the compatibilism let us suppose there is a sliding scale from 0-100 where 0 corresponds to total determinism and 100 implies exclusive self-determinism, most people fall in between these two extremes and one could say, for example, I am at 40 which could mean in 40 out of 100 cases I was able to act without compunction. This is also a frequentist or probabilistic view. Then there is physical science.

Schrödinger’s equation is to quantum mechanics what Newton’s second law of motion is to classical mechanics. It describes the evolution of a physical system made up of fundamental particles subject to certain forces over time. Schrödinger's equation grew out of the idea that particles such as electrons behave like particles in some situations and like waves in others. In experiments it was found that the electron "chooses" to be in one of these states, when an observation is made, based on the probabilities given by the equation. This led some to propose a free-will theorem of quantum mechanics. If we raise one level above, we can see that quantum mechanics presupposes these three entities: (1) observer, (2) nature that has consciousness and (3) quantum state. With some thought experiment we can infer that an electron has no absolute free will because it is dealt a set of quantum states a priori by the nature, much like a deck of cards, and all it did was choose one when an observation was made. While an electron may not have free will, one can say the observer is free to set up the experiment in any which way and interpret the result based on subjective experience that presupposes the existence of a self and that is free will. However, physicist Henry Stapp says the subjective experience at a neuronal level is made possible by the ions passing through channels (microtubules) in the neuronal cells and giving raise to quantum effects that eventually map into actions in the physical world based on their probabilities. The inference that can be drawn, therefore, is that free will is an illusion and a mathematical puzzle the nature plays on us.

Determinism

John Searle makes the distinction between physical determinism and psychological determinism. We can all agree that objects in the world follow physical laws we have no control over. You can't change the way planets revolve around the sun or sun sets (even though the latter is just an illusion created by the rotation of the earth). This Searle calls bottom-up in contrast to the top-down way the mind and body work. That is, if you would like to lift your finger, the mind issues a command to the appropriate neural structures to carry on the task. Searle sets aside the issue of how neuro-transmitters, at a low level, affect the mind itself. If we accept that mind is part of the nature that is deterministic by the physical laws, then it follows that mind is deterministic and should obey the physical laws. Searle says, "the form of determinism that is ultimately worrisome is not psychological determinism. The idea that the states of our mind are sufficient to determine everything we do is false".

Another kind of determinism is causality. Let us look at what physicist David Bohm, who believes that free-will is part of a causal chain in sync with the physical world, had to say. Suppose A → B → C → D... where A, B, C, D....are some events separated in time that are causally connected. If we were to say D is one of the many possible outcomes after C, like a decision tree that expands to multiple child nodes, even if A were to be an independent cause that is going back to the beginning of the universe, we would explain that D is the most probable outcome out of a set of D1,D2,D3...Dn outcomes. One can say D1...Dn are internal mental states that are weighed by the brain/mind resulting in the choice of the most probable one as the outcome. It is not clear to anyone in the neuroscience if our brains are capable of computing all of the D1....Dn and their probabilities.

To illustrate, suppose A=Hungry, B=walk to fridge, C=grab food. Now the probable states D1...Dn could be warm in microwave (D1), feed the dog(D2), go find a homeless person to feed(D3), put it back considering the fact that the food is unhealthy(D4), etc. For each of these D1...Dn, there are further outcomes. For example, D1 might manifest as E1=food is burnt which results in F1 that it is okay to eat or F2 it is better to trash it. And so on. If we substitute ourselves with a robot capable of computing all of these outcomes and their probabilities, then we would all agree that the robot has no free will. However, left to our devices, when we choose D2 over all the others, it would be apparent that we had made a choice with as much information as possible. Otherwise, we would be standing at the fridge with food held in our hands for a very long time....

What is moral responsibility?

Daniel Dennett defines moral responsibility as sticking to rules that were created by the peers with some punishment for violating the rules. We all belong to organizations where the rules and punishment differ, but essentially fit the definition by Dennett. Suppose you work at an American company where there is a dress code and you choose to wear a traditional Indian dress, then you had better prepare to be told by the boss that you violated the company rule. If you want to continue to work there, which is your free will (that is not really so, but for the moment accept that it is), you would take the necessary corrective action. The free will here is an illusion. Because if you tried this experiment at various companies in the US, you could probably succeed in finding a company that accepts your dress. But that comes with multiple trials and movement all over the US much against your comfort. However, suppose you wanted to fix it once and for all by organizing a petition drive to make a legislation that traditional Indian dress should be acceptable at all American workspaces. You might succeed in getting what you want, but when you are working, one day, with machinery and your sari or dhoti got pulled by the machinery or got stuck in the elevator door, causing you harm, it would serve as a lesson for violating the workspace rule that was enforced in the first company you worked for. So what did your free-will achieve? If anything, it made you morally responsible for all of those workspace accidents that resulted from wearing an Indian dress by the workers.

So who has moral responsibility? Bennett defines a Moral Agent Club where anyone who fulfills the following requirements is a member:

  • well informed: awareness of societal rules, laws, etc.
  • well-ordered desires not expecting miracles and wind-falls
  • moved by reasons: being open for discussion
  • not controlled by another agent: capable of thinking and reasoning independently; this is not always the case as we are being persuaded by the people we interact with and the advertisements in the media including social media, to be in a certain way (e.g. dress brand names, lose weight, support a democrat/republican, etc.)
  • punishable: if found violating rules and laws; Hume thinks determinism is what enables us to punish or reward a person. You punish a person to change their behavior or reward them to reinforce their behavior.

We can extend Bennett's theories to games like chess and basketball that have rules and the players are constrained to play by the rules. However, there are nearly infinite variety of moves within each of the games and the outcome is unpredictable. So it seems the players of these games have infinite choices within the framework of rules. Consider a skill like playing a piano. It starts out as a right brain hemisphere activity and with practice it shifts to left hemisphere of the brain and eventually to the posterior part of the brain. What this means is, at the beginning of acquiring a skill the brain is dealing with non-determinism and as it progresses to mastering the skill it tends to move towards determinism. This can be easily explained with an artificial neural net (ANN). At the very beginning the ANN is initialized with random weights that one can choose at will. As the supervised training takes over, the weights are adjusted by gradient descent. Once the supervised learning completes, the weights are frozen and the ANN is ready for prediction in the unsupervised mode.

Relevance of free will for reward or punishment

There are some who argue that all corporal punishment should be replaced by medication because our actions are deterministic or causally linked to the universe. But Dennett argues that all social contracts should contain the provision for punishment. The example he gives is: suppose you sign a contract and renege on the promise. Then you are told that you are being prosecuted and sent a subpoena. And you don't respond to the subpoena. What can be done by a judge other than pass a jail sentence for the contempt? If the judge didn't do that there is no reason to have contracts and feeling guilty is not enough (even though in some religions repentance is considered to be a virtue).

The application of punishment doesn't apply to humans only. Sam Harris quoted instances when people had punished animals by lynching and other means as an act of retribution for violating human laws. He gave an example of a case where a lawyer had defended a pack of rats that had eaten away a farmer's crop. The lawyer would tell the magistrate that his clients, i.e. rats, could not be present because they were scared away by cats. As late as in 1916 a circus elephant that had gone amok was hanged from a rail road crane. This kind of moral retribution is different from the law of karma that is applicable to all beings based on the sum total of their actions in this life and in previous births. The law of karma would say those who hanged the elephant would be brought to divine justice for an act of retribution because the errant elephant was possibly tortured by the human crew of the circus. Rightfully, we no longer accept torture as a valid means to punish an animal whether it is a circus animal or not.

Traditionally Abrahamic religions' blamed immoral behavior on free will. Sam Harris argues that when it is established that brain chemistry is the cause, it is immoral not to provide appropriate cure such as a nutritional supplement or surgery that would reverse the condition. This paves way for a better justice system even after accounting for the side-effects of medical procedures and pharmacological agents. Why would a benevolent god allow humans to wreak havoc with free will in a particular case (e.g. colonialism) and take personal charge in other cases (e.g. tsunami, earthquakes, etc.)? If scientists could show that natural disasters like tsunamis and earthquakes are caused by our choices like burning fossil fuels, releasing flouro-chloro-carbons that deplete the ozone layer of the atmosphere, would that change anything? He argued that pride and shame ought to be replaced with compassion and love as we are intrinsically linked to our past and present by causality, leaving no room for religion to castigate the followers with sin.

Budhist Perspective

Gautama Budha is a well known preacher who said desires and wants are the root cause of sadness and renunciation is the key to happiness. On the other extreme we have advertisements and conditioning of mind to want more and more things. Let me illustrate with the story told by an Australian monk called Ajahn Brahm. A man dreamt of an angel who gave him 5 pots of gold. After he woke up, at the time of breakfast, his wife served him 5 pieces of bread with 5 eggs. It turned out it was the 5th day of 5th month (May). On television he saw the horse#5 had won the race. So he figured 5 was his lucky number. He, then, went to bank, withdrew $5000 and bet on the horse#5. Soon after his horse had finished 5th in the race. Such is the power of suggestion. Our brains are not well equipped to reason with numbers and probabilities -- save a few scientists and engineers--and make accurate predictions in the world.

brain mri The functional MRI experiments done by neuroscientists led by Haynes show that the outcome of an event was predicted or chosen by the brain 10 seconds before the person had consciously acted on it. Haynes updated a classic experiment by the late Benjamin Libet, who showed that a brain region involved in coordinating motor activity fired a fraction of a second before test subjects chose to push a button. This means an average person has at least 5 seconds at disposal to decide to not act on a whim. This leads to the notion of "free won't" where we can curtail our whim much before we act on it. Whether Gautama Budha was aware of this experimental evidence based on intuition, his teachings have been vindicated and the world he envisioned would be based on free won't.

Interestingly Swami Sarvapriyananda quotes Bhagavad Gita-- Chapter 3 Verse 27, Chapter 13 Verse 29 and Chapter 14 Verse 19--that confirms the experiment's outcome, i.e. we don't will the action but merely execute it following prakriti's (nature) command. It is not clear how nature implants the intent into the mind. But Lord Krishna, the author of Gita, calls it maya or illusion. By extension, free will is an illusion and the wise, according to Lord Krishna, know that they cannot see through maya.

Role of Consciousness

John Searle draws an example from psychological experiments involving hypnosis. Suppose you are made to roll on the floor under hypnosis. This would be treated as a compulsion by others because you are in trance under hypnosis. After some time, when you are in a cafeteria, you announce that you want to check out the carpet by rolling on it as you are interested in investing on carpets, and you have free will. Would that be considered normal? For most people believing in post-hypnotic suggestion it seems normal. That is you are constrained to behave in a certain way. Whereas from your perspective you are acting out in complete freedom. This argument can be extended to movies and people that play movie characters. We don't call a real person a cannibal just because he played such a role in a movie. In fact, most people are forgiving towards the transgressions of real-life actors who play difficult roles in the movies that take a toll on their rational thinking.

More recent study by Thalia Wheatley shows that conscious awareness is not necessary for decision making. A subject is hypnotized and asked to make some decisions. These were compared with the decisions made after the subject woke up and fully conscious. There is virtually no difference. But these are simple motor actions involving muscle movement. It is not clear if this study is applicable to higher cognitive functions like planning, deliberation and future activity where free will is supposedly playing a role.

Split Brain Experiments

A split brain condition arises when in epileptic patients the corpus callossum that connects the two hemispheres of the brain is cut off. This procedure called commissurotomy is done so that seizures, which is an abnormal electrical activity in the brain, occurring in one hemisphere won't affect the other. Thus the patient is left with one functioning hemisphere at all times. Commissurotomy was shown to be effective in making the patients as normal seizure-free people. However, cognitive experiments performed on them showed that they confabulate. When the experimenter flashed "Key Ring" on a screen with the fixation cross in the middle, and asked what they saw, they said "Ring" because the speech center is in the left hemisphere, the right hemisphere read "Key" and the left hemisphere read "Ring" but neither read both. The scientist further gave a random mix of objects, and asked the patients to pick the one they saw, they picked a key with left hand and a ring with right hand. When asked why they picked different objects, they gave lengthy explanations to show that they are rational which the scientists call confabulation. Similar experiments were conducted with flashing dots that alternated between white and black colors. Donald Hoffman, a neuroscientist, posits their consciousness has been split which seems plausible. How about their free will? Given that one hand buttons the shirt and the other unbuttons the shirt, he predicts their free-will has also been taken away by the surgery. And most people when their free-will has been constrained, such as by a perceived threat, confabulate and give rambling explanations to prove their rationality. However, there is no cognitive experiment done to prove this assertion with regard to free will. Thus free will remains as an illusion.

Are physical laws always the same?

Determinism is often associated with immutable physical laws that were handed down to us from the time universe was created. Feynmann, a Nobel laureate physicist, wondered if these laws could have evolved just as the life has evolved on the earth. This is a worrisome prospect. We can accept that the earth is round and revolves around the sun (helio-centric) as Copernicus had explained. However, are we willing to accept that the law of gravitation had evolved as earth was separated from the sun as a flat or some other non-spherical structure that eventually evolved into a spherical structure? Can we say the early earth was flat and as it rotated, the mass got distributed evenly around the center just as a dough making machine agglomerates from spinning flour particles mixed with water? Everything is probable, but only one thing can be real in the physical world. The way Feynmann could break loose from determinism of physical laws and how Einstein could conceptualize quantum mechanical theories that questioned the validity of Newtonian laws for fundamental particles is something only an omniscient, omnipotent and omnipresent god could explain.

Some Arguments for Free Will

Besides compatibilism, the following arguments are extant for free will:

  • Libertarianism:Free will comes from introspection and self-observation; just as we can feel pleasure or pain, we feel free-will; also the creative freedom felt by the artistes and writers must presuppose free will; if we feel it to some degree in a sliding scale, then it is frequentist or probablistic
  • Ethical Appraisal: This applies to the affirmation of good and condemnation of evil and the legal system in general
  • Pragmatic Self Refutation: just as proceedings in the legal system appeal to the free will while asking to be persuaded in one way because the evidence says so; the same logic is applied to arguments for or against free will

No where but in medicine has free will been studied and debated intensely. In 2002, a 40 year old man had expressed his desire for pedophilia until his doctors determined that it was caused by an egg-sized brain tumor. And once the tumor had been removed, his sex-obsession disappeared.The cancer was located in the right lobe of the orbifrontal cortex, which is known to be tied to judgment, impulse control and social behavior. If there was no diagnosis made about the tumor, one would attribute the desire to free will.

We also attribute substance abuse addiction, such as with street drugs and prescription pills alike, as one of the manifestations of free will. While medical doctors might believe the brain chemistry may have something to do with addiction, the society at large relegates the issue to free will.

Other manifestations include Parkinson's where involuntary movements occur without the appearance of any free will, just as many of the bodily functions happen. It was well known that red blood cells regenerate after 3 months and many of the skin cells are replaced in a few weeks without any conscious process behind them. It has been said in Bhagavad Gita that all involuntary activities are carried out by prakriti. Until science can prove otherwise, we are free to choose that there is no free will in any of these activities.

In summary, free will is what is left when we filter compulsion, causality and irresponsible behavior. That is to say, we shouldn't be forced to comply; we can't escape from causality; we can't have a free will without moral responsibility for our actions as we cannot function in a society that is by definition based on an order.

Online References

https://www.youtube.com/watch?v=ThfJtP9w3fI Budhist perspective on free will

https://www.youtube.com/watch?v=ZfbOWKyfwzA Jordan Peterson's clip on free will

https://www.youtube.com/watch?v=4KvqkRHG77w Searle's free-will talk with hyponsis

https://www.youtube.com/watch?v=_FanhvXO9Pk Sam Harris' talk on free will

https://www.youtube.com/watch?v=v73S4BkItrc Quantum Theory and Free Will - Chris Fields, Henry Stapp & Donald Hoffman

https://www.youtube.com/watch?v=9uRTjfhIf4M Closer To Truth - Big Questions in Free Will

https://www.youtube.com/watch?v=VzbyeU3dK4g Swami Sarvapriyananda's lecture

https://www.newscientist.com/article/dn2943-brain-tumour-causes-uncontrollable-paedophilia/


Friday, June 1, 2018

Consciousness

Consciousness is an important aspect to study by neurologists and AI scientists alike because of its physiological and philosophical ramifications. A neurologist might want to know if a patient is conscious for medical reasons. For instance, an anesthesiologist would administer an anesthetic in proportion to the consciousness of the patient so as not to disable the organs entirely. A comatose patient, if can be called conscious, would cause moral dilemma among the medical practitioners and relatives. One can measure the temperature, pulse and blood pressure of a patient to determine if alive but nothing of that sort exists for consciousness (some times doctors project light on to the pupils which is not a definitive test). Robots that serve us could be attributed consciousness by virtue of the algorithms that enable the robots to function. For instance, if a robot while handling an infant could cause harm, it is suggestive of a bad design of the algorithms and parts used in the assembly of the robot by the manufacturer. A more topical example is the self-driving cars that are involved in road accidents or worse caused the accidents. Their manufacturers would defend themselves as not "consciously aware" of the defects.

So what is consciousness? Is it the substrate of all of the intelligence that has emerged in the form of mind or is it the emergent phenomenon from brain? Many hindus believe the world is made conscious in the dream of Lord Brahma. In general, religions ascribe consciousness as orthogonal to the physical plane of existence (dualism) that endow life forms the ability to carry on their respective roles under the divine will. Scientists believe consciousness is the product of evolution when the pre-frontal cortex of the brain has evolved over millions of years from the more primitive older brain composed of cerebrum, cerebellum, amygdala, etc. giving us the conscious experience.

An important aspect of consciousness is its moral dimension. For instance, we may not want to hurt an ant because we feel pain and expect an ant to feel the same. This is intrinsically a human-centric experience that can be applied to the vast life forms who potentially feel the pain, pleasure, fear, love, etc. just like us because we think they are conscious. Abattoir laws they are replete with not hurting an animal, such as a cow or a pig, while it is still being conscious. The kosher and halal methods, for instance, differ in this respect. North American Native Indians respected animals as equal in rights to humans. Of course they were hunted, but only for food, and the hunter first asked permission of the animal's spirit. The truth is that we really do not know which of these organisms is or is not conscious. We are biased to the extent the tradition, religion and law dictate. But we don't have a rational method or algorithm to determine if an organism has subjective experience that is analogous to our feelings.

We call someone as self-conscious to mean the intense awareness that comes with introspection and self-reflection. A dictionary meaning of self-conscious is: feeling undue awareness of oneself, one's appearance, or one's actions; having knowledge of one's own existence, especially the knowledge of oneself as a conscious being;(especially of an action or intention) deliberate and with full awareness, especially affectedly so. Most societies impose a social penalty for not being self-conscious which would be construed as awkward or mis-fit.

Learning is an important phenomenon we associate with consciousness. Learning to ride a bike, for example, starts out as a conscious task. Eventually it becomes subconscious so much so that it is the second nature and we are no longer aware that we are riding a bike. Much of the rote learning happens this way. When veda pandits who recite vedas were studied with MRI, numerous regions in their brains were found to be dramatically larger than those of controls, with over 10 percent more grey matter across both cerebral hemispheres, and substantial increases in cortical thickness. It was also revealed that the right hippocampus --that is responsible for spatial, visual and sound patterns--was more robust indicating their verbal memory was exemplary. It can be noted that we have right and left hippocampi that play a key role in short and long term memory. How is it all related to consciousness? It seems the veda pandits are reciting effortlessly indicating there is a subconscious linkage to it just as when riding a bike or driving a car once their operation has been learnt.

Despite the anecdotal evidence, it seems, consciousness has no causal impact on the world. Just by being conscious we can't affect anything in the real world. The world exists independent of our consciousness. What we perceive in the world while being conscious is only a snapshot of the world that won't impact the laws of motion or any other law governing the world. In other words, consciousness enables us to grasp the world as it exists, explains the events in the world and lets us immerse ourselves in the worldly behavior including socialization. Interestingly some researchers had defined a zombie as one exactly like human down to the last atom but without a subjective experience to qualify as a socializable entity. If a zombie is in the midst of us, how to detect it is a philosophical quandary.

Theories About Consciousness

Higher order theory is the foremost theory of consciousness that predicts higher-order or meta thoughts (HOT) and perceptions (HOP) about first-order thoughts and perceptions. That is, the meta construct either HOT or HOP operates (e.g. I am seeing a rose) over the first-order thought (e.g. The rose is red). There is no doubt that consciousness is about higher order thoughts when we study Krishna Consciousness and other similar spiritual discourse. But is that a thought or a perception is the question to consider. The HOP theorists posit an intra-sense for intra-mental monitoring. For someone like Mira Bhai the Krishna Consciousness is thought-free. For many religions a supreme being bestows the consciousness to mortals who are born without one. Actually we can consider it as part of the life of a fetus even before it is delivered into the world. We read about Abhimanya in Maha Bharatam to have learnt from Sage Narada who was explaining it to Subadra how to enter a padma vyuham (armed forces in a flower petal formation) while he was still in Subadra's womb. To counter, some pro-life people could say consciousness is bestowed on the fetus (nature) rather than acquired from the environment (nurture). But there is no neurological validity. A fully-formed fetus (> 27 weeks) can blink eyes and suck on its thumb without being aware that it is being studied by ultra-sound imaging. My wonderment at the HO theory is, what prevents one from predicting a hierarchy of states that transcend human existence and religious experience? For example, the Gita discourse by Lord Krishna raises above many levels of consciousness: I am like Arjuna; Krishna is my friend; Lord Krishna is an avatar of Lord Vishnu; Lord Vishnu is the ruler of Lord Brahma's creation; and so on.

The second foremost theory of consciousness to consider is reflexive which posits self-awareness, e.g.. I am watching a rose while situated in a garden. The self-awareness could lead to introspection which together form the core of consciousness without the need for a meta state. This has some support in hindu religion where the fetus was supposed to be ruminating over the previous lives and repenting for sins. The rumination over past births would end along with the memory of it once the fetus is delivered into the world. The interface between the loss of memory of previous births and acquisition of new sensory information after delivery is the raise of consciousness.

The third theory of consciousness is representational that posits that experiences essentially have representational or intentional contents and their phenomenal characters are largely determined by their contents. According to the representationalist, conscious mental states have no mental properties other than their representational properties. Thus two conscious or experiential states that share all their representational properties will not differ in any mental respect. One can explain representational consciousness by drawing examples from religious texts that preach doctrines like self-sacrifice to uphold a religious belief is worthy or a military boot camp where the soldiers are told to lay their lives across for the good of the nation because it is noble. We can infer that both of these are similar in the sense that they lead up to the same mental state of fearlessness. There is no huge difference between phenomenal (higher order or meta) and experiential (first order) contents in these cases. However, here the intention is important. In the case of a religious boot camp, the intention is to uphold a religious text by blind-faith and in the case of a military boot camp the intention is to defend a nation made of conscious beings.

The fourth theory we can look at is narrative interpretive theory that combines the HO and representational theory by predicting content fixation as the basis for consciousness. Consider problem solving which involves, say, manipulation of symbols to arrive at a desired result by intense focus. While doing so, there could be any number of sensations such as the clock ticking, car sounding a horn and a baby crying in the background. Assume these can be grouped as "Cartesian Theater" which is a term for privileged functional or spatial location in the brain . Some are quick to say that the cartesian theater is the consciousness. The proponents of the interpretive theory discount cartesian theater for a "Cerebral Celebrity" which in this case is the problem solving or the degree to which a given content influences the future development of other contents throughout the brain. Another aspect of this theory is that it views self as an emergent phenomenon as opposed to an inner observer and denies the existence of qualia—individual instances of subjective, conscious experience such as pain, pleasure,etc. --- as demarcating between conscious and unconscious states.

Global workspace theory is a prominent cognitive theory that supposes that there is a central location where different processors compete for a limited capacity resource by broadcasting information for widespread access and use. There is a parallel to this in AI's rule-based systems where different rules have access to what is called "working memory". A working memory is where the rules antecedents are matched with the state variables and when they fire the consequents are entered into. For example: Rule#1 might say if the sky is blue then it is a nice day; Rule#2 might say if it is a nice day then go for a run. If these rules fire sequentially then the working memory would contain "sky is blue", "nice day" and "go for a run." So where is consciousness in this? The act of broadcasting information around the brain from this memory bank is what represents consciousness. It has been claimed by the scientists that consciousness at both access and phenomenal sense occurs when information enters the global work space involving sensory areas as well as the pre-frontal cortex. If you can imagine being submerged in a lake with bubbles of air leaving your nose and raising to the surface, then one can be sure that you are conscious. In the similar vein, when the global workspace or working memory has been actively accessed by processors and cohered with their processing results, consciousness will be asserted.

In Attended Intermediate Representation theory, consciousness is the abstraction of sensory perception, viz. color, size, shape, etc. to a higher level but not as high as to call it an object in the physical world such as a flower or a car as that implies a moral judgment. The intermediate representation is not something we can map to in the physical world but can only be measured as neurological activity –which is akin to oscillating electrical pulses racing through the neurons--in the sensory area. This theory lends itself to a clinical diagnosis of biological consciousness because of the measurements involved, but does not apply to non-biological entities such as robots.

In the information integration theory (IIT) consciousness is a purely information-theoretic property of system. Consider a panpsychic theory that attributes consciousness to all objects. For example, an internal-combustion engine has lower consciousness than the automobile it runs which has lower consciousness than the driver. In the IIT the consciousness attributed to a component or a part is integrated with the whole to arrive at a measure of consciousness. The theory assumes that (a) conscious states are differentiated—meaning we can be conscious of any number of things and (b) the information in the conscious states is highly integrated and (c) To be conscious, then, you need to be a single, integrated entity with a large repertoire of highly differentiated states. Using mathematical modeling and entropy, the theory calculates a measure of integration within the brain. One can think of it as synergy. The more the brain is integrated, the greater the synergy and higher the consciousness.

Finally there are quantum theories that turn consciousness into a mystery at physical-psychic interface using the models of fundamental physical particles. The wave function is commonly used in physics to predict the state, i.e. velocity and location, of a fundamental particle such as an electron. It has been known that the quantum system can move from a superposition of multiple possible states to a single definite state which is called the wave function collapse. In physics such collapse can happen in the presence of an observer, i.e. an instrument or a sensor with which the particle might or might not be interacting to reveal its state. It has been posited that a similar phenomenon might be occurring in the microtubules of the cells, including neurons, without an observer being present to give raise to consciousness. Some have even predicted the equivalent of Bose-Einstein Condensate, where the individual atoms cannot be identified, i.e. the whole subsumes the parts, but only inside the brain. Such theories while might make ample sense for biological consciousness, cannot apply for the machine or robotic consciousness that doesn't involve any quantum physics unless one views the quantum phenomena of tunneling of electrons in silicon substrates as something of significance.

Online References

https://en.wikipedia.org/wiki/Consciousness

https://plato.stanford.edu/entries/consciousness/

https://www.livescience.com/47096-theories-seek-to-explain-consciousness.html

https://intelligenceway.com/sanskrit-brain-neuroscience-india-vedic/

https://www.scientificamerican.com/article/a-theory-of-consciousness/

Friday, May 25, 2018

Block Chain

Have you ever wondered how to send a payment by email? Did you ever question the commission charged by banks and other financial agencies for transfer of money? Do you want to vote electronically from your computer? If the answer to all these questions is yes, then you want to know about a block chain. A block chain is a network of computers –aka nodes-- each of which maintains records of transactions—called blocks—to enable transfer of digital currency and assets over the internet. Before we dive into the subject, let us first review the current banking or financial industry where moving money is routinely carried out. The financial institutions use a ledger to keep track of transactions and current balance of accounts. Let us say, Rob with an initial balance of $100 sends $50 to Pete with an initial balance of $200. The ledger after the transaction is processed will be updated to show Rob's current balance $50 and Pete's current balance $250. However, if Rob with an initial balance of $100 decides to send Pete $100 and Sally $50. This should be denied by processing the transactions serially, i.e. one at a time. This is called double pay.

The problem with financial institutions

By keeping the ledgers in a central server, the financial institutions are vulnerable to attacks by hackers. If the ledger is corrupted, there is no easy way to fix it. Also financial institutions charge commission on transactions to compensate for their enormous storage and processing needs. On a small transaction like $100, a 2% commission might look normal. But as such transactions happen over millions of times, one can imagine the extent of the cost incurred by the initiators of the transactions. Some estimate that to be about $6 billion a year. Besides, the transactions won't happen instantaneously and typically take several hours to process. Suppose one sends dollars from USA to India by a bank transfer. The bank in USA posts the transaction on the gateway in USA. The US gateway contacts the gateway in India which in turn contacts the bank in India. Then a currency conversion is done from dollars to rupees. If there is a way to pay the end user, the transaction will go through. Given the time zones and delays in the transmission, the simple money transfer takes several hours.

centralizedbanking.png

Block chain efficiency

A block chain transfers the money or any electronic asset almost in real-time. Also the commission or fee charged by a block chain is much smaller than that of financial institutions. One can add a fee to the transaction in a block chain to incentivize the miners which will be discussed later but that is only optional. Another important aspect of block chain is the ledger is distributed. Every node in the block chain has its own copy of the ledger. This does away with the centralized ledger and monopoly of the financial institutions. The advantage of distributed ledger is that even if a node or a group of nodes is made off-line, say by hacking, the rest of the network has the complete ledger. It is next to impossible for all the nodes in a block chain to go off-line or without the ledger. Thus a block chain ensures that the list of transactions or the ledger is available for all to see and update. The key is the ledger can only be updated but never truncated or edited to modify an older transaction.

distributed ledger.png

Different types of computing

Centralized: where a single server acts as the point of contact for information storage and retrieval. Examples are facebook, google, amazon, etc. While this is easier to implement, the problem is it won't scale. For example, the number of transactions a server can handle is limited and the availability of storage too is limited.

Distributed: A centralized system can be distributed. Any of the tech companies mentioned before have geographically spread out their centralized systems in a distributed manner. So the server accessed from India is different from the server accessed in USA. As long as the data on the two are kept in sync in real-time, there is consistency in the services.

Decentralized: the centralized and distributed models are pervious to denial of service attacks. Whereas a decentralized system, like bitcoin's block chain, is spread over hundreds and thousands of computers that are virtually impossible to hack. In a decentralized system every node acts as a peer which means every node has the same data as its peer node. So when a node is hacked, the data is not lost and can be retrieved from another healthy node.

Cryptocurrency

Block chain is based on the notion of cryptocurrency which is nothing but a made up currency just as dollars, rupees and pounds. Let's call it e-money. The participants in a block chain has initially an electronic wallet with certain e-money. They can purchase the e-money with physical currency for instance. As they transact with each other the balance in the wallet will be recalculated. The bitcoin block chain, that went online in 2009, has bitcoin as the cryptocurrency. There is an upper limit on how many bitcoins can be issued. It is 21 million at this time (the account holders on the bitcoin network can vote to increase this limit). It is called cryptocurrency because it can only be created by an activity called mining which is based on encryption technology. Miners are nodes in the block chain that compete to validate transactions by solving complex math puzzles based on cryptography. As an incentive to the validation effort, the first miner that has successfully validated a transaction will receive, say 12.5 bitcoins. Also the winning miner gets to add a block to the chain of blocks by creating an id called nonce – which is a unique number that is used only once in the block chain—and a hash – an alpha-numeric identifier that points to the last block in the block chain. Calculation of nonce and hash (especially if the first few digits of the hash have to be zeroes as some block chains need) is a time consuming process. Besides the miner may have to start from the beginning of the block chain, not necessarily always, to compute say Rob's current balance before his transaction to pay Pete can be validated. This too involves computations that can span over several minutes. Hence the miners need to be reimbursed for their efforts which is based on the cryptocurrency or e-money.

blockchainabc.png

Consensus

What happens when someone creates a block with fake transactions and adds it to the block chain? Since blocks are stored in nodes by a process called distributed computing each node gets to vote on the legitimacy of the block. If a majority votes that a block is fake, then it will be removed from the block chain. And all the nodes will be updated. So the block chain is self-correcting by using distributed computing. This way the central ledger of a financial institution which is vulnerable to hacking is replaced with a distributed digital ledger that is impervious to hacking.

Land titles

A typical use case for block chain is land titles. Let us say the last known date of a land title is 1810 and the land has changed several owners. Until now, the records would be kept in physical repositories that are vulnerable to theft, fire and flood. If a land title transfer record in 1900 is lost, then there is no way to replace it and it is not possible to validate the seamless transfer of land ownership. Also someone could introduce a fake title in 2018 and gain possession of the land. Of course, by scanning the physical documents into digital format some of these issues can be mitigated. Still, a centralized storage of the digital information is vulnerable to hacking and misuse. A block chain will store the title repository in a distributed computing model and prevent all fraud.

Smart Contracts

One of the block chain implementations called Ethereum uses what are called smart contracts. A smart contract is a dynamic computer code. Suppose you run a company and want to use a block chain to pay the salaries of your employees. You can write a smart contract to implement the following pseudo code

If employee is a manager, then pay x% of the earnings plus $5000 
else if employee is a worker, then pay minimum wage plus y% of the earnings 
and so on.

Once a smart contract is created it exists for ever on the block chain and is immutable. Suppose you want to change the commission of manager from x% to z%, then a new smart contract has to be written. The old contract that is replaced would not be deleted but made inactive. This way we have a record of the history of salary payments made by your company. A tax auditor, for example, might be interested in your company's salary structure over the years and a block chain serves as the proof.

smart contract.png

Why can't we send money by email?

If you send some e-money by email to Pete, you can easily copy and paste the email to multiple recipients. The question is do you have enough e-money to pay for them all? Who is validating it?And how are they doing it? If you can generate e-money as you wish, then there will be inflation, due to increased money supply, and decrease in purchasing power with your e-money. With this now you understand why a block chain with mining is needed. In a block chain the generation of e-money is controlled by the mining activity which serves as incentive for proof of work. This is like in real world you get paid by your employer in return for work done, except that the proof of work in a block chain is done by employing algorithms and computer hardware.

Public and Private Keys

Each electronic wallet is made of public and private keys. They are based on a type of math called cryptography. A public key is what you share with others when you want to receive e-money. It is an alpha-numeric string like 0x23yu7Ia. This is like your name and address that is unique. Associated with each public key is a private key that is like your password. When you want to send money you use private key to encrypt and broadcast the encrypted transaction to the block chain. This way only you can send money from your electronic wallet and no one else can.

Different types of block chains

A block chain essentially is distributed computer storage and computing spanning over the internet. Just as web has revolutionalized the way we store electronic files, the block chain would do the same for electronic payments, land titles, voting for an election, etc. One can envision multiple block chains for different purposes. Some call it the Web 3.0 where different computers come together to participate in the digital economy based on block chains. Web2.0 is the conglomeration of web services provided by Amazon EC, Microsoft Azure, etc.

One can also envision public, private and hybrid block chains where the participants can vary based on the type of block chain. In a public block chain anyone can join and begin transacting. Such is the case with Ethereum. A private block chain can be set up by a financial institution for its members only. A hybrid block chain will have a mix of public and private participants whose main interest is to transact digital assets electronically at the speed of the internet.

The memory crunch

If a new node has to be created in a block chain, then the node has to download all of the blocks that were ever created on the block chain. This is a time consuming process. Also the blocks are usually stored in physical memory, not an off-line database, to make them available for transactions. This can be very expensive both money and computation wise. If you are a miner, who is not interested in initiating transactions but earning e-money by validating transactions, then it is worthwhile provided you have an efficient algorithm and powerful computer hardware to search for hashes in the blocks. So it is not unusual to find groups of miners pooling their resources to make an investment in expensive computer hardware and other infrastructure needed for mining.

Smart contracts based on code have to face the same scrutiny as programs that are not NP-complete, i.e. that won't terminate or take enormous resources to carry on their computation. In many cases, it is easier to run them on a subset of nodes than on each node in the entire network. For instance, a smart contract to handle voting for a candidate is better run on a single node where the results of votes cast for each candidate can be kept in memory. Each voter can be assigned a unique hash and allowed to vote only once. What makes the voting process tamper proof, ultimately, is that the code in the smart contact is shared within the block chain and can be verified by all the interested parties. Contrast this with electronic voting machines (EVM's) that are used in India whose software is not open for auditing.

The Haves and Have Nots

The proof of work is the central piece of block chain that is going to decide the winners and losers. But it is so hard and takes so much of computing power that people are already suggesting other kinds of proof to substitute proof of work. Some of them are proof of stake, proof of burn, etc. In proof of stake a wealthy miner or a consortium of miners can add a new block based on their reputation. For instance, in a real world scenario if two banks are willing to finance a project at the same interest rate and you have to pick just one bank, which one do you pick? That's where the balance sheet comes into play. The bank with the best balance sheet would win assuming a balance sheet is based on a bank's reputation. In proof of burn, the miners are actually going to spend their cryptocurrency. In real world, this is like engineers coming up with models of their designs by spending some of their money to attract investors. All of these methods that are alternatives to proof of work are going to make the block chain a highly contested topic in the future.

Online References

https://bitsonblocks.net/2016/02/01/a-gentle-introduction-to-smart-contracts/

https://www.youtube.com/watch?v=5Tr13l0O1Ws is a video presentation of proof of work coding and alternatives to proof of work

Wednesday, May 16, 2018

Cluster Analysis

As children we have learned to identify similarity between objects based on color, size, etc. For example, dogs and cats are grouped as animals; elmo and big-bird are grouped as animated characters; and so on. What makes this possible? It is probably innate and evolution made it possible to handle the diversity in the creation. Hindus had divided the ancient professions into 4 clusters: priests, rulers, traders and workers. Later on they changed it to mean by birth that led to the disarray we find now in India. However, the clustering or grouping of objects remains an important aspect of machine learners. Sometimes this is called classification where a class is identified to represent a group of entities with common characteristics. In finance, the classes are called segments and the classification process is called segmentation. In biology it is taxonomy (hierarchical classification) of all living things: kingdom, phylum, class, order, family, genus, and species. In the web, the search engines group pages using clustering algorithms that compute the similarity between the contents of pages with respect to the search string.

Where do we derive classes from?

In ancient India, hindus identified 4 professions everyone was occupied with which were called varnas. It was posited that varnas were ordained by the divine. There was no reason to think otherwise. Over time the varnas transformed into castes that are in existence to this day. Also after attaining freedom from the British, Indian leaders created states based on the predominant language spoken in various regions that can be called linguistic clusters. So clustering makes it possible for the government to allocate funds equitably and manage a billion people.

Almost all ancient cultures gazed at the starry sky and could find clusters that are called constellations. For example, four open clusters have been known from earliest times: the Pleiades and Hyades in the constellation Taurus, Praesepe (the Beehive) in the constellation Cancer, and Coma Berenices. The appearance of the Coma Berenices cluster to the naked eye led to the naming of its constellation for the hair of Berenice, wife of Ptolemy Euergetes of Egypt (3rd century bce).

Clusters are a common feature of all modern economies. According to Michael Porter of Harvard Business School a geographic economic cluster is a group of firms, suppliers, support services, specialized infrastructure, producers of related products, and specialized institutions (e.g., training programs and business associations) that arise in particular fields. Viewed globally the economic clusters evolve into supply chains where parts and services are outsourced to various countries (usually the lowest bidder) and shipped for final assembly locally. In US clusters have traditionally evolved around waves of immigrants. But there are no studies to support this claim other than the observation that more recent immigrants tended to opt for IT careers in the silicon valley when compared to the earlier immigrants who established finance and manufacturing centers in the east coast.

In economics, the classes of consumers making similar buying decisions is highly sought after. For example, a franchise might want to target different advertising material for the rich, upper middle class, middle class, etc. The classes here are based on the income and purchasing power. Similarly national economies are classified based on GDP. Sudhamathy and Jothi Venkateswaran proposed Fuzzy Temporal Clustering Approach that performs clustering of the web site visitors and the web site pages based on the frequency of visits and time spent. They claim their study can provide intelligent recommendations for the E-Commerce web sites that focus on specific product recommendations and behavioral targeting. There are 2 methods to achieving this: temporal web logs clustering (i.e. analyze the logs of e-commerce sites) and fuzzy clustering which involves analyzing change in cluster compositions and change in cluster memberships. Also e-commerce sites would like to know which ads should be directed to which users based on their previous buying decisions. For instance, a high roller who bought a top-end computer is highly likely to buy a computer accessory like a printer or a routing device. The difficult thing though is to predict whether the high roller is buying gifts for a friend or for oneself as companies would like to target their ads for gift buyers differently from personal shoppers.

Another natural grouping can be found in the postal codes that are based on the proximity of a building to a center. Philip Jennings described cluster analysis for assessing the risk factors that go into insurance premium calculation where in the clusters were defined based on Zip Codes and data from Geographic Information Systems (GIS). The GIS data provides more finer analysis as every building has a GIS code.

Finally, microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. For a complete discussion of microarray you can see https://www.genome.gov/10000533/dna-microarray-technology/. The idea behind microarray is to mix DNA of a patient possibly having a mutation with control DNA (the one without mutation for the gene of interest) and pass the two over a microchip that consists of normal DNA and mutated DNA that are made synthetically. The control DNA as expected will bind to the normal DNA on the chip. If the patient's DNA has mutation it will bind to the portions of the chip having the mutated DNA. The interesting thing is this process can be automated and a robot can be programmed to repeatedly run the microarray experiments over samples of DNA.

But how does one know if a gene mutation can cause a particular disease? Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation.

The unending quest to differentiate oneself and group participation

Life can be called a quest to differentiate oneself from others while at the same time belonging to a group. The various ethnic societies in USA are groups of people who follow the same religion or speak the same language that could be other than English or immigrants from the same region, etc. They tend to derive their initial identity from a group while pining to differentiate themselves as unique Americans in line with the legendary American exceptionalism.

The groups of religious minorities within a larger milieu of democracy either get absorbed into the majority religion or tend to stay in the fringe. Pollsters identify these trends in every election to identify potential independent voters and the appeal of their political platform to them.

The suburban clusters to a city center tend to maintain their individual identities and even have their own mayors and city governments while at the same time tending to be identified with the city center for all practical purposes (e.g. sharing an airport).

Cluster Analysis

Studying clusters of entities is necessary in occupations like economics, statistics and biology. So computer scientists study algorithms to create clusters in data that are homogeneous within based on a similarity metric. The clusters are necessarily heterogeneous with respect to each other. The foremost algorithm for cluster analysis is called K-means where K is the number of clusters that data should be fitted to.

The algorithm starts by selecting K items from the observations to serve as the initial centroids for the clusters which can be random or based on domain specific method. Note that if there are M observations, there are $N \choose K$ = ${{n!} \over {(n-k)!k!}}$ choices where N >= K. Next, the algorithm computes the Euclidean distance for each of the remaining observations and assigns to their closest centroids (which could be the cluster's mean). This step is called “cluster assignment”. After the assignment, the algorithm recalculates the mean value of each cluster. The term cluster “centroid update” is used to designate this step. Now that the centers have been recalculated, every observation is checked again to see if it might be closer to a different cluster. All the objects are reassigned again using the updated cluster means. The cluster assignment and centroid update steps are iteratively repeated until the cluster assignments stop changing (i.e. until convergence is achieved). That is, the clusters formed in the current iteration are the same as those obtained in the previous iteration. In practice though, if there is less than 1% change, the algorithm is terminated. The convergence achieved could be a local minimum, especially when the initial K observations are randomly selected and there is no guarantee that a global minimum could be found.

K-means algorithm can be summarized as follows:

    Select K points as initial centroids.
    repeat
     Form K clusters by assigning each point to its closest centroid.

     Recompute the centroid of each cluster.
   until
     Centroids do not change.

The Euclidean distance between 2 points (x1, y1) and (x2,y2) can given as

$\sqrt{ {(x1-x2)^2} + {(y1-y2)^2}}$

The centroid could be the average or a mean of the distances between the objects in a cluster. The objective is to minimize the distance from the centroid which some times is called the prototype to represent the cluster.

The algorithm terminates when all of the observations have been accounted for and the centroids don't change significantly with each iteration.

The problem of identifying the size of K—that is the number of clusters needed to account for all observations-- is handled in a couple of ways. First we can try different K values, say from 1 to 40, and compute the sum of squares error (SSE) for each of them. When we plot the K vs. SME, if the plot looks like a hockey stick or an increase in K no more reduces SSE, then we choose K at the elbow. Other methods include silhouette method and gap statistic.

One could use probability instead of SME. For instance, a person at a company can be a student intern and a full-time employee at the same time. In that case, a student's membership in the cluster could be based on probability where all the probabilities add up to be 1. Similarly for fuzzy sets and other membership-based methods.

Online References

http://www.econ.upf.edu/~michael/stanford/maeb7.pdf for hierarchical clustering

https://www-users.cs.umn.edu/~kumar001/dmbook/ch8.pdf clustering using k-means

https://uc-r.github.io/kmeans_clustering

http://www.hbs.edu/faculty/Publication%20Files/Clusters_and_Economic_Policy_White_Paper_8e844243-aa23-449d-a7c1-5ef76c74236f.pdf for clustering in economics

http://www.enggjournals.com/ijet/docs/IJET12-04-03-045.pdf fuzzy approach to e-commerce clustering

https://www.casact.org/pubs/dpp/dpp08/08dpp34.pdf zip code based clustering

https://www.cse.buffalo.edu/DBGROUP/bioinformatics/papers/survey.pdf

https://www.genome.gov/10000533/dna-microarray-technology/