## Stats

**Authors@Google: Nate Silver **

http://www.youtube.com/watch?v=mYIgSq-ZWE0&feature=em-uploademail

**Probablistic Programming & Bayesian Methods for Hackers**

http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

## Model Thinking Part 3 – Coarsera

**Tipping Point**

- Can be direct cause which is something causes the tip — small action or event (small change in variable) has large impact on end state — this depends also on context
- Can be contextual (percolation model is example) which means something in context changes to permit tip – a slight change in the context (environment)can have a big impact on final state
- Between and within class — system can go within the class from one state to another or the system can go between states across systems

Note: 4 types of systems: equilibrium , periodic, random, complex

**Percolation Model** – physics model – water percolating down into ground is example

use a checker board model with filled in squares to show tipping point (59.27%) at which a process goes from start to finish (end) used in information flow, forest fire prediction, innovation flow, social model (context changes if more connections are made)

**Diffusion Model** – natural diffusion but no tipping point

W sub t (time) = W sub-t (time) (current state) + N * c(number * contact rate) * t (tau – transmission rate) * W sub-t (time) /N (people with disease / number) * (W sub-t (time) – N / N)

**SIS**– S (for susceptible), I (for infectious) and S (for susceptible). epidemiology — non-linear – diffusion model but person can get cured than reinfected — has a tipping point because people can be cured — this can alos apply to information

W sub t (time) = W sub-t (time) (current state) + N * c(number * contact rate) * t (tau – transmission rate) * W sub-t (time) /N (people with disease / number) * (W sub-t (time) – N / N) **– a [# of people cured)W (sub t) — **this can be simplified using standard algebra notation

Basic Reproduction Number is R sub 0 = ct/a then if R sub 0> 1 disease spreads , hence a tipping point

V = % vaccinated so R sub0(1 – V) = r sub 0 he< or equal to V nce 1 – 1/R sub 0 = number to vaccinate — must get to the number to vaccinate percentage (R sub 0) to create the tipping point needed to protect those people not vacinnated. If R sub 0 < or + 1, no spread of disease, but if / or > than 1, disease will spread

**Economic Models**

Real growth is growth – inflation

## Model Thinking Part 2 – Coarsera

**Model thinking – Scoot E. Page – U. of Michigan
**

**Modeling People**

3 types of models: rational actor, behavioral, and rule based

- Rational Actor: maximize some function, object, optimized
- Behavior: observe what people do, neuroscience supports this view
- Rule base _ e.g. Schelling model, people follow rule(s)

Rational Actor

- An objective
- Optimize the objective

Value is to establish benchmark to be used because: (also can determine how far people from rational choice)

- Objective decision – reduced bias
- Easy to mathematically solve for
- People learn by repeating, hence closer to good result
- Mistakes cancel out, eliminating bias

**Behavior Model**

Daniel Kahnemen says people have slow process (rational) and fast process (emotional) thinking

Bias examples

- Prospect theory – bird in hard worth two in the bush
- Hyperbolic – take immediate reward and discount future pain
- Status Quo — stick with current situation
- Bias Rate — first rate estimate will have a second rate estimate be make similar

Some people say above biases are WEIRD — essentially only happens in developed countries

Approach to take in modeling: look at rational then introduce bias, then look at potential rules.

**Rule Based**

Follow a rule(s), e.g. Schelling. Rules can be easy to understand, capture main effort, can be ad hoc, can be exploitable, may note be optimal

Two types in two contexts which are Decision and Game

Fixed — might be random, tit for tat, grime trigger

Adaptive — in Decision context might be gradient (take something that works and extend), in Decision might be Best Response, mimicry (copy other’s that work)

**Impact of 3 models**

- Market — type of model does not matter much since market forces drive towards a mean. Zero Intelligence Agent will bid randomly lower (if buyer) and randomly higher (if seller) and than average to actual cleared price.
- Race to Bottom (game) — type of model can matter as follows — (1) Rational bid would b zero since rational person would always assume some number and quote a number 2/3’s from the mean (goal of the game). He would continue to iterate this quoted number down since he would assume that everyone is rational and hence will be eventually iterating down to. (2) Biased (behavior model) would probably just pick 50. (3) Rule (best response) – will guess 50, some will then take 2/3 of 50 (33), then some will take 2/3 of 33 (22), then over the long run, down to zero — rule is a mix of rational and biased. If all people in game are rational, then new irrational person enters game, will cause rational people to be influence by what the rational person perceives the irrational person will do.

**Categorical and Linear Models**

- Categorical Models — group sample and analyze sub-groups to see how well the grouping covers (reduces variance) the data — see r_squared.py for detailed explanation. R-Squared indicates how much of the data variance was explained by the model — Correlation indicates a relationship between variables, BUT Causation means that one variable is actually dependent an another — Or X (independent) entails Y (dependent) check this last statement

- Linear mode – draw a line through the data — Y depends on X, Y = F(X), Y is a function of X
- Non-linear model – curve, some similarities to linear model
- Big Coefficient – shows how important X is in Y = a1 x1 + a2 x2 — useing Big Coefficient as a guide only makes sense in world where there is lots of data

## Model Thinking Part 1- Coarsera

**Model thinking – Scoot E. Page – U. of Michigan**

**Reasons to Model**

- Predict points
- Understand data
- Understand patterns — class of outcome
- Design solutions

**Shelling Segregation Model**

- Sorting (be with people you like) and peer influence (homophily) (act like people you are with) on behavior
- model is agent based, consisting of agent, behavior, and aggregation
- Shelling threshold based — tipping (move tipping: exodus tip (move out), genesis (move in)
- micro behavior is not the same as macro aggregation, hence people individually may be tolerant (micro) but resulting population movement (macro may produce segregation.

**Granovettor’s Model**

Model is N individuals, Each N has a threshold: T (j) for person j, join if T (j) other join e.g.: 0,1,2,3,4 — all will join since 0 for sure joins

This tells us that lower thresholds and more variation in thresholds can have large effects.

**Standing Ovation Model**

extension to** Granovettor’s Model**

partially caused by peer effect (homophily) and information

Model rules:

- Threshold to stand: T
- Quality: Q
- Signal S=Q+E (error of diversity)
- Initial Rule: S > TG; Subsequent rule standup if more X% (E) stand up

Why standing ovation:

- higher quality
- lower threshold\larger peer effect
- more variation (E)
- Celebrity
- group size

This model (and others) is a way to determining results in situations.

Note that deciding if a sorting effect or peer effect from a snap shoot of state is difficult to determine, hence additional dynamic data to see what was major cause — hence more dynamic data meed in model to explain what movement (sorting) or neighbor influence (peer) caused change.

**Aggregation**

Philip Anderson – “More is Different” – emergent properties at macro level

Aggregation of data

Independent decisions

Central Limit Theorem

Assumes independent events and finite (limited range) variance

Probability is a normal (Gaussian) distribution (looks like a bell curve) with likely outcome at the mean

plus/minus 1 Standard deviation (sigma) cover 68% in a normal curve; plus/minus 2 standard deviations cover 95%; plus/minus 3 standard deviations cover 99.75% of possible outcomes

Binomial Theorem example if N/2 is the mean, then SD is (square root of N / 2), so N = 100 = 100/ 2 = 50, SD is 10/2 (sq root of 100) / 2) = 5

6Sigma = 3.4 errors out of 1,000,000 events

Cellular Automaton– self organization, emergence — some level of functionality such as glider; get logic right

Game of Life — Rules: If off, turn on if 3 neighbors on; if on, stay on if 2 or 3 neighbors on

**Preferences**

Transitive Preference order, rational preferences, collective preferences – social sense this is called rational

all individual are rational but the aggregate can be non-transitive — this is called *Condorcet’s paradox*

**Decision Making**

For multidimensional decision making:

- Qualitative – use matrix
- Quantitative – use matrix plus weights

For Spatial Chose models: select ideal point (single or multidimensional) for a good, then determine values for each decision point possibility , than a see difference from ideal point and add up the differences. The difference between the ideal points and the computed points determines which is the closest decision point possibility. Come back to this and clarify.

**Probability**

- outcome is a set of things that can happen
- event is a subset of outcomes.

3 axions

- probability of an event is between 0 and 1 — could also be 0 or 1
- Sum of all possible outcomes = 1
- If A is subset of B, then p(A) is less than or equal to p(B) or p(B) is greater than or equal to p(A)

3 types of statistics:

- Classical (e.g., dice and coin flips — can prove mathematically
- Frequency (estimating frequency by counting data – how often has something happend? – assumes stationarity (nothing has changed),
- Subjective (estimate but base on model)

## Statistics and Probability Background

Stats: Probability Rules

“I like to use what’s called a **joint probability distribution.** (Since **disjoint** means nothing in common, **joint** is what they have in common — so the values that go on the inside portion of the table are the** intersections or “and”s of each pair of events).** “**Marginal” is another word for totals** — *it’s called marginal because they appear in the margins.”*

The UBC Calculus Online Homepage — short intro

**probability density**

n. Statistics In both senses also called probability distribution.

1. A function whose integral over a given interval gives the probability that the values of a random variable will fall within the interval.

2. The calculated value of a probability density.

John Sowa — mathamatical background

paskin.org — some useful presentations on probability and structured representation

<a href="http://en.wikipedia.org/wiki/Standard_score“>Standard (Z) Score

## Stanford, Coursera and Udacity Classes

**Technology Entrepreneurship Course **

http://eesley.blogspot.com/

remember: intersection is ‘and’ and is multiply in stats — when going down tree is ‘and’

rember: ‘or’ is add in stats

INSERT multiplication and addition rules

**Links**

**AIqus Wiki**

http://www.aiqus.com/wiki/Main

**AI Class Index**

https://github.com/lorenzo-stoakes/stanford-ai/blob/master/index.md

**NLP Class**

**Introduction to Information Retrieval**

http://i.stanford.edu/~ullman/mmds.html

**Calculus**

**derivative**(differential calculus) is a way of measuring instantaneous change, such as finding the speed of a car when you only know its position. The slope of the tangent line to a point on a curve corresponds to the derivative. [subtract starting position from ending position] We can take the derivative of the position function—a process of**subtraction and division**—to find the corresponding velocity function, which we can use to determine our instantaneous speed at any given point.**integral**(integral calculus) which describes the accumulation of an infinite number of tiny pieces that add up to a whole and can be used, for instance, to determine the distance a car has traveled when only its speed is known. The area under a curve corresponds to the integral. [add measurements together of movement (speed?)] Remember that the derivative and integral are opposite processes: Each undoes the work of the other. The integral is a process of**multiplication and addition**.- More important, functions are connected to each other in valuable ways: Velocity [speed] is the derivative of position, and acceleration [rate of change] is the derivative of velocity. We integrate acceleration [rate of change] over time to find the velocity function [integration], and we integrate velocity over time to find our position function [integral]. These connections let us make inferences based on what we do know, to figure out what we don’t know.
**Acceleration**is a vector quantity that is defined as the rate at which an object changes its velocity. An object is accelerating if it is changing its velocity.**Speed**is a scalar quantity that refers to “how fast an object is moving.” Speed can be thought of as the rate at which an object covers distance.

**fundamental theorem of calculus**

- The first part of the theorem, sometimes called the first fundamental theorem of calculus, shows that an indefinite integration[1] can be reversed by a differentiation. The first part is also important because it guarantees the existence of antiderivatives for continuous functions.[2]
- The second part, sometimes called the second fundamental theorem of calculus, allows one to compute the definite integral of a function by using any one of its infinitely many antiderivatives. This part of the theorem has invaluable practical applications, because it markedly simplifies the computation of definite integrals.
- The first published statement and proof of a restricted version of the fundamental theorem was by James Gregory (1638–1675).[3] Isaac Barrow (1630–1677) proved a more generalized version of the theorem,[4] while Barrow’s student Isaac Newton (1643–1727) completed the development of the surrounding mathematical theory. Gottfried Leibniz (1646–1716) systematized the knowledge into a calculus for infinitesimal quantities.

UTC – [Charlotte is -4 hours from UTC] (http://www.time.gov/timezone.cgi?UTC/s/0/java

**For sets A and B**

- union = distinct elements – from A or B or both A and B
- difference or complement = elements in A that are not in B – everything in sample space that is not that event – e.g. if A = (number > 0) then ~A = (numbers = or less than 0)
- intersection = shared elements – elements that are in both sets – both A and B, hence appears 2 times

**Probability**

- independent event — the outcome of one event has no relationship to another event
- probability: statistical definition – probability tells us how often something is like to occur wen an experiment is repeated. Probability is concerned with the outcome of trials. (1) The probability of an event is always between 0 and 1. (2) The probability of the sample space is always 1. (3) The probability of an event and its complement is always 1, follows from (1) and (2)
- sample space – the set of all elementary outcomes of a trial.
- mutually exclusive — there are no elements (points) in common
- permutation – is all possible ways elements in a set can be arranged – note that th order of elements is important in permutation: (a, b, c) is a different permutation than (a, c, b) – calulate the numbr of permutations in any set of distinct elements (no elements repeat) by using factorials (n!). The number of permutations of subsets of size k drawn from a set of size n is calculate as: nPk =n!/(n-k)!
- combinations – similar to permutations with the difference that the order of the elments is not significant in combinations (a,b, c) is the same combination as (b, a, c); for this reason there is only one combination of the set (a, b, c) – nCk = nPk/k!
- In technical terms, the set of outcomes from rolling one or more dice has a
**discrete uniform distribution**because the possible outcomes can be enumerated and each outcome is equally likely.The results of two or more dice thrown at once (or multiple throws of the same die) are assumed to be independent of each other, so the probabilities of each combination of numbers are calculated by multiplying the probability of each separate result.

**Conditional Probability**

- P(E|F) is read as the probability E given F — F is known as the condition.
- Two variables are independent if the following relationship holds P(E|F) = P(E)

- Calculate the probability
**of any**of several events occurring (the union of several events, add the probabilities of the individual events.

- Union of mutually exclusive events equation is: P(E U F) = (P(E) + P(F)
- Union of non-mutually exclusive events equation is: P(E U F) = P(E) + P(F) – P(E intersection F)

- Calculate the probability
**of all**of several events occurring (the intersection of several events, add the probabilities of the individual events.

**Bayes Formula**

- Use this formula when P(B|A) but want to know P(A|B)

**relational algebra operators**

- select (Sigma) pick (select) rows — Sigma-operater, condition(s) on expression
- project (Pi) pick (select) columns — Pi-operation, condition(s) on expression
- cross-product (X) combine two relations — results in relation A times relation B number of rows
- natural join (bow-tie) cross-product that enforces equality on all attributes with same name, drops duplicates
- theta join (box-tie, subscript Theta) is natural join with condiction(s) — what db people call a Join
- union (U) = distinct elements in both sets A and B
- difference (-) = the difference of A and B is the relation that contains all the tuples that are in A but that are not in B, so when A-B, elements in A that are not in B, when B-A, elemens in B that are not in A
- intersection (inverted U or &) = shared elements – elements that are in both sets A and B — adds no expressive power, can be expressed as (A – (A – B)); also intersection can be expressed as (A natural join (bow-tie) B)
- rename (Rho) applies new relation name and new attributes to an existing relation, or just new relation name or just new attribute names — needed because joins on relations must have matching column names

** Stanford AI Class Circle. If you want to get added, leave a comment below. I also have a circle for ML and DB** https://plus.google.com/100129275726588145876/posts/KnPjU8oQM2z

**Overview of AIMA Lisp Code** http://aima.cs.berkeley.edu/lisp/doc/overview.html

**Lisp User Guide** http://aima.cs.berkeley.edu/lisp/doc/user.html

**AI Twits** https://twitter.com/#!/aiclass

**DB Twits** https://twitter.com/#!dbclass

**ML Twits** https://twitter.com/#!/ml_class/

**Terms**

**The formal definition of inverse proportion**:

- Two quantities, A and B, are in inverse proportion if by whatever factor A changes, B changes by the multiplicative inverse, or reciprocal, of that factor. E.g: 2/4 would be 2 *
**3**and 4 ***1/3**– 1/3 is the reciprocal of 3

**Closure:**

- In mathematics, a set is said to be closed under some operation if performance of that operation on members of the set always produces a unique member of the same set. For example, the real numbers are closed under subtraction, but the natural numbers are not: 3 and 8 are both natural numbers, but the result of 3 − 8 is not. Similarly, a set is said to be closed under a collection of operations if it is closed under each of the operations individually. see https://secure.wikimedia.org/wikipedia/en/wiki/Closure_%28mathematics%29

## NLP

**ANTLR**

http://www.antlr.org/

**Dr Seuss Quotes = for search**

http://www.earlymoments.com/Dr-Seuss–His-Friends-Club/Favorite-Dr-Seuss-Quotes/