## Stats

**Authors@Google: Nate Silver **

http://www.youtube.com/watch?v=mYIgSq-ZWE0&feature=em-uploademail

**Probablistic Programming & Bayesian Methods for Hackers**

http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

## Hadoop

**Comparison of Hadoop Frameworks**

http://blog.samibadawi.com/2012/03/hive-pig-scalding-scoobi-scrunch-and.html

## Model Thinking Part 3 – Coarsera

**Tipping Point**

- Can be direct cause which is something causes the tip — small action or event (small change in variable) has large impact on end state — this depends also on context
- Can be contextual (percolation model is example) which means something in context changes to permit tip – a slight change in the context (environment)can have a big impact on final state
- Between and within class — system can go within the class from one state to another or the system can go between states across systems

Note: 4 types of systems: equilibrium , periodic, random, complex

**Percolation Model** – physics model – water percolating down into ground is example

use a checker board model with filled in squares to show tipping point (59.27%) at which a process goes from start to finish (end) used in information flow, forest fire prediction, innovation flow, social model (context changes if more connections are made)

**Diffusion Model** – natural diffusion but no tipping point

W sub t (time) = W sub-t (time) (current state) + N * c(number * contact rate) * t (tau – transmission rate) * W sub-t (time) /N (people with disease / number) * (W sub-t (time) – N / N)

**SIS**– S (for susceptible), I (for infectious) and S (for susceptible). epidemiology — non-linear – diffusion model but person can get cured than reinfected — has a tipping point because people can be cured — this can alos apply to information

W sub t (time) = W sub-t (time) (current state) + N * c(number * contact rate) * t (tau – transmission rate) * W sub-t (time) /N (people with disease / number) * (W sub-t (time) – N / N) **– a [# of people cured)W (sub t) — **this can be simplified using standard algebra notation

Basic Reproduction Number is R sub 0 = ct/a then if R sub 0> 1 disease spreads , hence a tipping point

V = % vaccinated so R sub0(1 – V) = r sub 0 he< or equal to V nce 1 – 1/R sub 0 = number to vaccinate — must get to the number to vaccinate percentage (R sub 0) to create the tipping point needed to protect those people not vacinnated. If R sub 0 < or + 1, no spread of disease, but if / or > than 1, disease will spread

**Economic Models**

Real growth is growth – inflation

## Model Thinking Part 1- Coarsera

**Model thinking – Scoot E. Page – U. of Michigan**

**Reasons to Model**

- Predict points
- Understand data
- Understand patterns — class of outcome
- Design solutions

**Shelling Segregation Model**

- Sorting (be with people you like) and peer influence (homophily) (act like people you are with) on behavior
- model is agent based, consisting of agent, behavior, and aggregation
- Shelling threshold based — tipping (move tipping: exodus tip (move out), genesis (move in)
- micro behavior is not the same as macro aggregation, hence people individually may be tolerant (micro) but resulting population movement (macro may produce segregation.

**Granovettor’s Model**

Model is N individuals, Each N has a threshold: T (j) for person j, join if T (j) other join e.g.: 0,1,2,3,4 — all will join since 0 for sure joins

This tells us that lower thresholds and more variation in thresholds can have large effects.

**Standing Ovation Model**

extension to** Granovettor’s Model**

partially caused by peer effect (homophily) and information

Model rules:

- Threshold to stand: T
- Quality: Q
- Signal S=Q+E (error of diversity)
- Initial Rule: S > TG; Subsequent rule standup if more X% (E) stand up

Why standing ovation:

- higher quality
- lower threshold\larger peer effect
- more variation (E)
- Celebrity
- group size

This model (and others) is a way to determining results in situations.

Note that deciding if a sorting effect or peer effect from a snap shoot of state is difficult to determine, hence additional dynamic data to see what was major cause — hence more dynamic data meed in model to explain what movement (sorting) or neighbor influence (peer) caused change.

**Aggregation**

Philip Anderson – “More is Different” – emergent properties at macro level

Aggregation of data

Independent decisions

Central Limit Theorem

Assumes independent events and finite (limited range) variance

Probability is a normal (Gaussian) distribution (looks like a bell curve) with likely outcome at the mean

plus/minus 1 Standard deviation (sigma) cover 68% in a normal curve; plus/minus 2 standard deviations cover 95%; plus/minus 3 standard deviations cover 99.75% of possible outcomes

Binomial Theorem example if N/2 is the mean, then SD is (square root of N / 2), so N = 100 = 100/ 2 = 50, SD is 10/2 (sq root of 100) / 2) = 5

6Sigma = 3.4 errors out of 1,000,000 events

Cellular Automaton– self organization, emergence — some level of functionality such as glider; get logic right

Game of Life — Rules: If off, turn on if 3 neighbors on; if on, stay on if 2 or 3 neighbors on

**Preferences**

Transitive Preference order, rational preferences, collective preferences – social sense this is called rational

all individual are rational but the aggregate can be non-transitive — this is called *Condorcet’s paradox*

**Decision Making**

For multidimensional decision making:

- Qualitative – use matrix
- Quantitative – use matrix plus weights

For Spatial Chose models: select ideal point (single or multidimensional) for a good, then determine values for each decision point possibility , than a see difference from ideal point and add up the differences. The difference between the ideal points and the computed points determines which is the closest decision point possibility. Come back to this and clarify.

**Probability**

- outcome is a set of things that can happen
- event is a subset of outcomes.

3 axions

- probability of an event is between 0 and 1 — could also be 0 or 1
- Sum of all possible outcomes = 1
- If A is subset of B, then p(A) is less than or equal to p(B) or p(B) is greater than or equal to p(A)

3 types of statistics:

- Classical (e.g., dice and coin flips — can prove mathematically
- Frequency (estimating frequency by counting data – how often has something happend? – assumes stationarity (nothing has changed),
- Subjective (estimate but base on model)

## Linked Data Graph Orientated Frameworks

**Glossary of Semantic Technology Terms**

http://www.mkbergman.com/1017/glossary-of-semantic-technology-terms/

**The Rationale for Semantic Technologies**

http://www.mkbergman.com/1015/the-rationale-for-semantic-technologies/

**Ontologies as Conceptual Models**

http://www.cambridgesemantics.com/blog/-/blogs/ontologies-as-conceptual-models?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+EnterpriseSemantics+%28Enterprise+Semantics+Blog%29

**The Rationale for Semantic Technologies**

http://www.mkbergman.com/1015/the-rationale-for-semantic-technologies/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AI3_AdaptiveInformation+%28AI3%3A%3A%3AAdaptive+Information%29

**JSON-LD** by Manu Sporny

https://www.youtube.com/watch?feature=player_embedded&v=vioCbTo3C-4#!

**Semantic Web Intro** by Manu Sporny

http://bit.ly/Kw6Asm

**Linked Data Intro** by Manu Sporny

http://www.youtube.com/watch?v=4x_xzT5eF5Q

**Linked Data FAQ**

http://structureddynamics.com/linked_data.html

**Linked Data – Welcome to the Data Network** (may behind a pay wall)

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6062547

**Pregel**

http://www.michaelnielsen.org/ddi/pregel/

**Google Pregel Graph Processing **

http://horicky.blogspot.com/2010/07/google-pregel-graph-processing.html

**Pregel: a system for large-scale graph processing – “ABSTRACT”**

http://delivery.acm.org/10.1145/1590000/1582723/p6-malewicz-2.pdf?ip=99.5.73.174&acc=ACTIVE%20SERVICE&CFID=61362319&CFTOKEN=25119961&__acm__=1326116961_4c5dbb95b82d3aaedcc9b134c8fd4318

**Pregel a system for larescale graph processing slides**

http://www.slideshare.net/shatteredNirvana/pregel-a-system-for-largescale-graph-processing

**GoldenOrb**

http://goldenorbos.org/

**Signal/Collect**

http://code.google.com/p/signal-collect/

## Data Science

**What is data science?**

http://radar.oreilly.com/2010/06/what-is-data-science.html

**Poll Results: Top Languages for Data Mining/Analytics**

http://www.kdnuggets.com/2011/08/poll-languages-for-data-mining-analytics.html?k11n20