Archive

Archive for the ‘Statistics’ Category

Stats

Authors@Google: Nate Silver
http://www.youtube.com/watch?v=mYIgSq-ZWE0&feature=em-uploademail

Probablistic Programming & Bayesian Methods for Hackers
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

Advertisements

Model Thinking Part 3 – Coarsera

Tipping Point

  • Can be direct cause which is something causes the tip — small action or event (small change in variable) has large impact on end state — this depends also on context
  • Can be contextual  (percolation model is example) which means something in context changes to permit tip – a slight change in the context (environment)can have a big impact on final state
  • Between and within class — system can go within the class from one state to another or the system can go between states across systems

Note:  4 types of systems: equilibrium , periodic, random, complex

Percolation Model – physics model – water percolating down into ground is example

use a checker board model with filled in squares to show tipping point (59.27%) at which a process goes from start to finish (end) used in information flow, forest fire prediction, innovation flow, social model (context changes if more connections are made)

Diffusion Model –  natural diffusion but no tipping point

W sub t (time) = W sub-t (time) (current state) + N * c(number * contact rate) * t (tau – transmission rate) * W sub-t (time) /N (people with disease  / number) * (W sub-t (time) – N / N)

SIS– S (for susceptible), I (for infectious) and S (for susceptible). epidemiology —  non-linear – diffusion model  but person can get cured than reinfected — has a tipping point because people can be cured — this can alos apply to information

W sub t (time) = W sub-t (time) (current state) + N * c(number * contact rate) * t (tau – transmission rate) * W sub-t (time) /N (people with disease  / number) * (W sub-t (time) – N / N) – a [# of people cured)W (sub t) — this can be simplified using standard algebra notation

Basic Reproduction Number is R sub 0 = ct/a then if R sub 0> 1 disease spreads , hence a tipping point

V = % vaccinated  so R sub0(1 – V) = r sub 0 he< or equal to V nce 1 – 1/R sub 0 = number to vaccinate — must get to the number to vaccinate percentage (R sub 0) to create the tipping point needed to protect those people not vacinnated.  If R sub 0 < or + 1, no spread of disease, but if / or > than 1, disease will spread

Economic Models

Real growth is growth – inflation

Model Thinking Part 2 – Coarsera

Model thinking – Scoot E. Page – U. of Michigan

Modeling People

3 types of models: rational actor, behavioral, and rule based

  • Rational Actor: maximize some function, object, optimized
  • Behavior: observe what people do, neuroscience supports this view
  • Rule base _ e.g. Schelling model, people follow rule(s)

Rational Actor

  • An objective
  • Optimize the objective

Value is to establish benchmark to be used because: (also can determine how far people from rational choice)

  • Objective decision – reduced bias
  • Easy to mathematically solve for
  • People learn by repeating, hence closer to good result
  • Mistakes cancel out, eliminating bias

Behavior Model

Daniel Kahnemen says people have slow process (rational) and fast process (emotional) thinking

Bias examples

  • Prospect theory – bird in hard worth two in the bush
  • Hyperbolic – take immediate reward and discount future pain
  • Status Quo — stick with current situation
  • Bias Rate — first rate estimate will have a second rate estimate be make similar

Some people say above biases are WEIRD — essentially only happens in developed countries

Approach to take in modeling:  look at rational then introduce bias, then look at potential rules.

Rule Based

Follow a rule(s), e.g. Schelling. Rules can be easy to understand, capture main effort,  can be ad hoc, can be exploitable, may note be optimal

Two types in two contexts which are Decision and Game

Fixed — might be random, tit for tat, grime trigger

Adaptive — in Decision context might be gradient (take something that works and extend), in Decision might be Best Response, mimicry (copy other’s that work)

Impact of 3 models

  • Market — type of model does not matter much since market forces drive towards a mean. Zero Intelligence Agent will bid randomly lower (if buyer) and randomly higher (if seller) and than average to actual cleared price.
  • Race to Bottom (game) — type of model can matter as follows  — (1) Rational bid would b zero since rational person would always assume some number and quote a number 2/3’s from the mean (goal of the game).  He would continue to iterate this quoted number down since he would assume that everyone is rational and hence will be eventually  iterating down to. (2) Biased (behavior  model) would probably just pick 50. (3)  Rule  (best response) – will guess 50, some will then take 2/3 of 50 (33), then some will take 2/3 of 33 (22), then over the long run, down to zero — rule is a mix of rational and biased. If all people in game are rational, then new irrational person enters game, will cause rational people to be influence by what the rational person perceives the irrational person will do.

Categorical and Linear Models

  • Categorical Models — group sample and analyze sub-groups to see how well the grouping covers (reduces variance) the data — see r_squared.py for detailed explanation.  R-Squared indicates how much of the data variance was explained by the model — Correlation indicates a relationship between variables, BUT Causation means that one variable is actually dependent an another — Or X (independent) entails Y (dependent) check this last statement
  • Linear mode – draw a line through the data — Y depends on X, Y = F(X), Y is a function of X
  • Non-linear model – curve, some similarities to linear model
  • Big Coefficient – shows how important X is in Y = a1 x1 + a2 x2  — useing Big Coefficient as a guide only makes sense in world where there is lots of data

Statistics and Probability Background


Statistics: Lecture Notes

Stats: Probability Rules
“I like to use what’s called a joint probability distribution. (Since disjoint means nothing in common, joint is what they have in common — so the values that go on the inside portion of the table are the intersections or “and”s of each pair of events).Marginal” is another word for totalsit’s called marginal because they appear in the margins.”

The UBC Calculus Online Homepage  — short intro

probability density
n. Statistics In both senses also called probability distribution.
1. A function whose integral over a given interval gives the probability that the values of a random variable will fall within the interval.
2. The calculated value of a probability density.

Chain Rule

John Sowa — mathamatical background

paskin.org — some useful presentations on probability and structured representation

Normalization

Normal Distribution

<a href="http://en.wikipedia.org/wiki/Standard_score“>Standard (Z) Score

Probability Density Function

Linked Data Graph Orientated Frameworks

January 9, 2012 Comments off

Glossary of Semantic Technology Terms
http://www.mkbergman.com/1017/glossary-of-semantic-technology-terms/

The Rationale for Semantic Technologies
http://www.mkbergman.com/1015/the-rationale-for-semantic-technologies/

Ontologies as Conceptual Models
http://www.cambridgesemantics.com/blog/-/blogs/ontologies-as-conceptual-models?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+EnterpriseSemantics+%28Enterprise+Semantics+Blog%29

The Rationale for Semantic Technologies
http://www.mkbergman.com/1015/the-rationale-for-semantic-technologies/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AI3_AdaptiveInformation+%28AI3%3A%3A%3AAdaptive+Information%29

JSON-LD by Manu Sporny

https://www.youtube.com/watch?feature=player_embedded&v=vioCbTo3C-4#!

Semantic Web Intro by Manu Sporny
http://bit.ly/Kw6Asm

Linked Data Intro by Manu Sporny
http://www.youtube.com/watch?v=4x_xzT5eF5Q

Linked Data FAQ
http://structureddynamics.com/linked_data.html

Linked Data – Welcome to the Data Network (may behind a pay wall)
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6062547

Pregel
http://www.michaelnielsen.org/ddi/pregel/

Google Pregel Graph Processing
http://horicky.blogspot.com/2010/07/google-pregel-graph-processing.html

Pregel: a system for large-scale graph processing – “ABSTRACT”
http://delivery.acm.org/10.1145/1590000/1582723/p6-malewicz-2.pdf?ip=99.5.73.174&acc=ACTIVE%20SERVICE&CFID=61362319&CFTOKEN=25119961&__acm__=1326116961_4c5dbb95b82d3aaedcc9b134c8fd4318

Pregel a system for larescale graph processing slides

http://www.slideshare.net/shatteredNirvana/pregel-a-system-for-largescale-graph-processing

GoldenOrb
http://goldenorbos.org/

Signal/Collect
http://code.google.com/p/signal-collect/

Data Sources

December 18, 2011 Comments off

Welcome to NYC Open Data
http://nycopendata.socrata.com/

Common Crawl
http://www.commoncrawl.org/

Public Sector Information – Raw Data for New Services and Products
http://ec.europa.eu/information_society/policy/psi/index_en.htm