Information Theory Shannon, entropy, coding, and the mathematics of signal — from Boltzmann to deep learning
A mind map of information theory: the thermodynamic precursors; Shannon's 1948 foundations; coding and compression; channel capacity and error correction; Kolmogorov complexity and algorithmic information; and the modern applications across machine learning, biology, and physics. Named theorists, theorems, codes, and applications with dates across six branches.
+ − Reset 100%
Thermodynamic Precursors Shannon's Foundations Source Coding & Compression Channel Capacity & Error Correction Algorithmic Information Theory Modern Applications Statistical mechanics Demon arguments and Landauer Pre-Shannon transmission theory Bell Labs context The 1948 paper Entropy and mutual information Two foundational theorems Cryptography paper Shannon's other work Lossless coding foundations Adaptive and arithmetic coding Dictionary methods Lossy compression Universal compression Block codes Convolutional and trellis codes Modern capacity-approaching codes Channel models and theorems Network information theory Kolmogorov complexity Universal probability and induction Resource-bounded variants Computability bounds Information theory in ML Information geometry Statistical and causal Biology and neuroscience Economics and language Quantum information Rudolf Clausius — entropy concept, 1865 Ludwig Boltzmann — S = k log W, 1872 James Clerk Maxwell — Maxwell's demon thought experiment, 1867 J. Willard Gibbs — statistical mechanics formalization, 1902 Leo Szilárd — engine thought experiment, 1929 (info ↔ entropy) Léon Brillouin — Science and Information Theory, 1956 Rolf Landauer — Landauer's principle, IBM 1961 (kT ln 2 per erasure) Charles Bennett — reversible computation, 1973 Harry Nyquist — Certain Factors Affecting Telegraph Speed, BSTJ 1924 Ralph Hartley — Transmission of Information, BSTJ 1928 Hartley's formula H = n log s Vladimir Kotelnikov — sampling theorem, USSR 1933 (priority dispute) Bell Telephone Laboratories founded, 1925 Mervin Kelly — research VP fostering basic science, 1940s Pickard, Vail — telegraph + telephone empirical foundations John R. Pierce — Bell Labs colleague who championed Shannon Claude Shannon — A Mathematical Theory of Communication, BSTJ Jul/Oct 1948 Two-part paper, ~80 pages "Bit" coined formally — credited to John Tukey by Shannon Schematic diagram: source → encoder → channel → decoder → destination Reissued as book — Shannon + Warren Weaver, 1949 Weaver's expository preface popularizes the framework H(X) = -Σ p(x) log p(x) — Shannon entropy Joint entropy H(X,Y); conditional entropy H(X|Y) Mutual information I(X;Y) = H(X) - H(X|Y) Differential entropy for continuous random variables Entropy rate H(X) for stochastic processes Kullback-Leibler divergence — Solomon Kullback + Richard Leibler, 1951 Source coding theorem — L ≥ H(X) bits/symbol Noisy channel coding theorem — R < C achievable, R > C unachievable Channel capacity C = max_p(x) I(X;Y) Shannon-Hartley formula — C = B log₂(1 + S/N) Fano's inequality — H(X|Y) ≤ H(P_e) + P_e log(|X|-1) Communication Theory of Secrecy Systems, BSTJ Oct 1949 Perfect secrecy proof — one-time pad Confusion and diffusion — design principles Unicity distance Shannon's 1937 MIT thesis — Boolean algebra in switching circuits Shannon's mouse Theseus — early adaptive maze-solver, 1950 Shannon's 1956 paper on chess-playing programs Shannon as juggler, unicyclist, financial-investor (with Edward Thorp) Shannon-Fano coding — Fano notes, 1949 David Huffman — optimal prefix code, MIT 1952 Kraft's inequality — Leon Kraft, MIT thesis 1949 McMillan's inequality — extended Kraft to uniquely decodable codes, 1956 Peter Elias — arithmetic coding concept, 1963 (undercredited) Rissanen + Langdon — practical arithmetic coding, 1979 Range coding variant Asymmetric numeral systems (ANS) — Jarosław Duda, 2009 Lempel-Ziv 77 — Abraham Lempel + Jacob Ziv, 1977 Lempel-Ziv 78, 1978 LZW — Terry Welch, 1984 (GIF, Compress) Deflate — Phil Katz, 1993 (PKZIP, gzip) Burrows-Wheeler Transform — bzip2, 1994 Brotli — Google, 2013; Zstandard — Facebook 2015 Rate-distortion theory R(D) — Shannon 1959 JPEG (Joint Photographic Experts Group), 1992 MPEG-1, MPEG-2, MPEG-4 video MP3 — Karlheinz Brandenburg, Fraunhofer 1993 AAC — 1997; Opus — 2012 AV1 — open video codec, 2018 Neural compression — 2017–present (NN-based image, audio) Jorma Rissanen — Minimum Description Length (MDL), 1978 Universal compressor — converges to entropy without source statistics Context Tree Weighting — Willems, Shtarkov, Tjalkens, 1995 Hamming codes — Richard Hamming, Bell Labs 1950 Hamming distance and minimum distance Singleton bound; Plotkin bound; Gilbert-Varshamov bound Reed-Muller codes — Reed + Muller, 1954 Reed-Solomon codes — Reed + Solomon, 1960 (CDs, DVDs, QR codes) BCH codes — Bose, Chaudhuri, Hocquenghem 1959–1960 Peter Elias — convolutional codes, 1955 Andrew Viterbi — Viterbi algorithm, 1967 Trellis-coded modulation — Ungerboeck, 1982 Robert Gallager — LDPC codes, MIT thesis 1960 (rediscovered 1996) David MacKay rediscovers LDPC, 1996 Turbo codes — Berrou, Glavieux, Thitimajshima, ICC 1993 Polar codes — Erdal Arıkan, 2008 (5G control channel) Iterative decoding — belief propagation Binary symmetric channel (BSC) — flip with probability p Binary erasure channel (BEC) Additive white Gaussian noise (AWGN) Wireless fading channels (Rayleigh, Rician) MIMO capacity — Telatar 1995, Foschini 1996 Multiple-access channel — Cover, 1972 Slepian-Wolf — distributed source coding, 1973 Wyner-Ziv coding, 1976 Broadcast channel — Cover, 1972 Network coding — Ahlswede, Cai, Li, Yeung, 2000 Interference channel capacity — open problem Ray Solomonoff — algorithmic probability, 1960 (priority disputed) Andrey Kolmogorov — independent formulation, 1963–1965 Gregory Chaitin — independent formulation, 1966 (age 19) K(x) — length of shortest program outputting x Incompressibility of random strings Invariance theorem — K is universal up to additive constant Solomonoff induction — universal prior over computable sequences Connection to Bayesian inference Marcus Hutter — AIXI universal agent, 2005 Levin's coding theorem — relates K to algorithmic probability Logical depth — Bennett, 1988 Sophistication — Koppel, 1987 Bennett's "deep" vs. random vs. trivial sequences K(x) is uncomputable in general Chaitin's Ω — halting probability, transcendental, uncomputable Berry paradox connection MDL principle — Jorma Rissanen, 1978 Cross-entropy loss = KL divergence + entropy of true distribution Maximum likelihood estimation as cross-entropy minimization Variational lower bound (ELBO) — VAEs, 2013 Information bottleneck principle — Tishby, Pereira, Bialek, 2000 Tishby + Schwartz-Ziv — IB in deep learning, 2017 (controversial) Mutual information neural estimation (MINE), 2018 Contrastive learning objectives — InfoNCE Shun-ichi Amari — Information Geometry, 1985 Fisher information matrix as Riemannian metric Natural gradient descent α-divergences and dual connections Akaike Information Criterion (AIC) — Akaike, 1973 Bayesian Information Criterion (BIC) — Schwarz, 1978 Transfer entropy — Schreiber, 2000 Granger causality (closely related) Genome compression — Lempel-Ziv on DNA Mutual information in genetic regulatory networks Bialek + Berry + Tishby — efficient coding in retina Friston — free-energy principle Predictive coding in neuroscience Akerlof — The Market for Lemons, QJE 1970 (info asymmetry) Stiglitz, Spence, Akerlof — Nobel 2001 Zipf's law and entropy of language Cross-language entropy comparisons Communication-theoretic semantics — semantic information research Von Neumann entropy S(ρ) = -Tr(ρ log ρ), 1932 Holevo bound — classical info from quantum state Quantum mutual information Schumacher source coding theorem (qubit term coined), 1995 Entanglement entropy in many-body physics Quantum channel capacities — Holevo-Schumacher-Westmoreland Information Theory Brian Tighe · Mind Maps Orbital mind map. Scroll to zoom, drag to pan, or use the buttons above (+ / − / 0 keys also work). Hover a node to highlight its path to the center and the subtree beneath it. How to read this The center holds the topic. The six branches fan out bilaterally — three on each side — each in its own color. Sub-branches nest three levels deep under each top-level branch. Hover a leaf to trace the path back to the center; hover a branch to see everything it contains.
This is the shape the topic has when you try to hold the whole field in your head at once. It is not an argument; it is a scaffold. The essays argue against or within scaffolds like this one.