← Mind Maps ·

Information Theory

Shannon, entropy, coding, and the mathematics of signal — from Boltzmann to deep learning

A mind map of information theory: the thermodynamic precursors; Shannon's 1948 foundations; coding and compression; channel capacity and error correction; Kolmogorov complexity and algorithmic information; and the modern applications across machine learning, biology, and physics. Named theorists, theorems, codes, and applications with dates across six branches.

Thermodynamic PrecursorsShannon's FoundationsSource Coding & CompressionChannel Capacity & Error CorrectionAlgorithmic Information TheoryModern ApplicationsStatistical mechanicsDemon arguments and LandauerPre-Shannon transmission theoryBell Labs contextThe 1948 paperEntropy and mutual informationTwo foundational theoremsCryptography paperShannon's other workLossless coding foundationsAdaptive and arithmetic codingDictionary methodsLossy compressionUniversal compressionBlock codesConvolutional and trellis codesModern capacity-approaching codesChannel models and theoremsNetwork information theoryKolmogorov complexityUniversal probability and inductionResource-bounded variantsComputability boundsInformation theory in MLInformation geometryStatistical and causalBiology and neuroscienceEconomics and languageQuantum informationRudolf Clausius — entropy concept, 1865Ludwig Boltzmann — S = k log W, 1872James Clerk Maxwell — Maxwell's demon thought experiment, 1867J. Willard Gibbs — statistical mechanics formalization, 1902Leo Szilárd — engine thought experiment, 1929 (info ↔ entropy)Léon Brillouin — Science and Information Theory, 1956Rolf Landauer — Landauer's principle, IBM 1961 (kT ln 2 per erasure)Charles Bennett — reversible computation, 1973Harry Nyquist — Certain Factors Affecting Telegraph Speed, BSTJ 1924Ralph Hartley — Transmission of Information, BSTJ 1928Hartley's formula H = n log sVladimir Kotelnikov — sampling theorem, USSR 1933 (priority dispute)Bell Telephone Laboratories founded, 1925Mervin Kelly — research VP fostering basic science, 1940sPickard, Vail — telegraph + telephone empirical foundationsJohn R. Pierce — Bell Labs colleague who championed ShannonClaude Shannon — A Mathematical Theory of Communication, BSTJ Jul/Oct 1948Two-part paper, ~80 pages"Bit" coined formally — credited to John Tukey by ShannonSchematic diagram: source → encoder → channel → decoder → destinationReissued as book — Shannon + Warren Weaver, 1949Weaver's expository preface popularizes the frameworkH(X) = -Σ p(x) log p(x) — Shannon entropyJoint entropy H(X,Y); conditional entropy H(X|Y)Mutual information I(X;Y) = H(X) - H(X|Y)Differential entropy for continuous random variablesEntropy rate H(X) for stochastic processesKullback-Leibler divergence — Solomon Kullback + Richard Leibler, 1951Source coding theorem — L ≥ H(X) bits/symbolNoisy channel coding theorem — R < C achievable, R > C unachievableChannel capacity C = max_p(x) I(X;Y)Shannon-Hartley formula — C = B log₂(1 + S/N)Fano's inequality — H(X|Y) ≤ H(P_e) + P_e log(|X|-1)Communication Theory of Secrecy Systems, BSTJ Oct 1949Perfect secrecy proof — one-time padConfusion and diffusion — design principlesUnicity distanceShannon's 1937 MIT thesis — Boolean algebra in switching circuitsShannon's mouse Theseus — early adaptive maze-solver, 1950Shannon's 1956 paper on chess-playing programsShannon as juggler, unicyclist, financial-investor (with Edward Thorp)Shannon-Fano coding — Fano notes, 1949David Huffman — optimal prefix code, MIT 1952Kraft's inequality — Leon Kraft, MIT thesis 1949McMillan's inequality — extended Kraft to uniquely decodable codes, 1956Peter Elias — arithmetic coding concept, 1963 (undercredited)Rissanen + Langdon — practical arithmetic coding, 1979Range coding variantAsymmetric numeral systems (ANS) — Jarosław Duda, 2009Lempel-Ziv 77 — Abraham Lempel + Jacob Ziv, 1977Lempel-Ziv 78, 1978LZW — Terry Welch, 1984 (GIF, Compress)Deflate — Phil Katz, 1993 (PKZIP, gzip)Burrows-Wheeler Transform — bzip2, 1994Brotli — Google, 2013; Zstandard — Facebook 2015Rate-distortion theory R(D) — Shannon 1959JPEG (Joint Photographic Experts Group), 1992MPEG-1, MPEG-2, MPEG-4 videoMP3 — Karlheinz Brandenburg, Fraunhofer 1993AAC — 1997; Opus — 2012AV1 — open video codec, 2018Neural compression — 2017–present (NN-based image, audio)Jorma Rissanen — Minimum Description Length (MDL), 1978Universal compressor — converges to entropy without source statisticsContext Tree Weighting — Willems, Shtarkov, Tjalkens, 1995Hamming codes — Richard Hamming, Bell Labs 1950Hamming distance and minimum distanceSingleton bound; Plotkin bound; Gilbert-Varshamov boundReed-Muller codes — Reed + Muller, 1954Reed-Solomon codes — Reed + Solomon, 1960 (CDs, DVDs, QR codes)BCH codes — Bose, Chaudhuri, Hocquenghem 1959–1960Peter Elias — convolutional codes, 1955Andrew Viterbi — Viterbi algorithm, 1967Trellis-coded modulation — Ungerboeck, 1982Robert Gallager — LDPC codes, MIT thesis 1960 (rediscovered 1996)David MacKay rediscovers LDPC, 1996Turbo codes — Berrou, Glavieux, Thitimajshima, ICC 1993Polar codes — Erdal Arıkan, 2008 (5G control channel)Iterative decoding — belief propagationBinary symmetric channel (BSC) — flip with probability pBinary erasure channel (BEC)Additive white Gaussian noise (AWGN)Wireless fading channels (Rayleigh, Rician)MIMO capacity — Telatar 1995, Foschini 1996Multiple-access channel — Cover, 1972Slepian-Wolf — distributed source coding, 1973Wyner-Ziv coding, 1976Broadcast channel — Cover, 1972Network coding — Ahlswede, Cai, Li, Yeung, 2000Interference channel capacity — open problemRay Solomonoff — algorithmic probability, 1960 (priority disputed)Andrey Kolmogorov — independent formulation, 1963–1965Gregory Chaitin — independent formulation, 1966 (age 19)K(x) — length of shortest program outputting xIncompressibility of random stringsInvariance theorem — K is universal up to additive constantSolomonoff induction — universal prior over computable sequencesConnection to Bayesian inferenceMarcus Hutter — AIXI universal agent, 2005Levin's coding theorem — relates K to algorithmic probabilityLogical depth — Bennett, 1988Sophistication — Koppel, 1987Bennett's "deep" vs. random vs. trivial sequencesK(x) is uncomputable in generalChaitin's Ω — halting probability, transcendental, uncomputableBerry paradox connectionMDL principle — Jorma Rissanen, 1978Cross-entropy loss = KL divergence + entropy of true distributionMaximum likelihood estimation as cross-entropy minimizationVariational lower bound (ELBO) — VAEs, 2013Information bottleneck principle — Tishby, Pereira, Bialek, 2000Tishby + Schwartz-Ziv — IB in deep learning, 2017 (controversial)Mutual information neural estimation (MINE), 2018Contrastive learning objectives — InfoNCEShun-ichi Amari — Information Geometry, 1985Fisher information matrix as Riemannian metricNatural gradient descentα-divergences and dual connectionsAkaike Information Criterion (AIC) — Akaike, 1973Bayesian Information Criterion (BIC) — Schwarz, 1978Transfer entropy — Schreiber, 2000Granger causality (closely related)Genome compression — Lempel-Ziv on DNAMutual information in genetic regulatory networksBialek + Berry + Tishby — efficient coding in retinaFriston — free-energy principlePredictive coding in neuroscienceAkerlof — The Market for Lemons, QJE 1970 (info asymmetry)Stiglitz, Spence, Akerlof — Nobel 2001Zipf's law and entropy of languageCross-language entropy comparisonsCommunication-theoretic semantics — semantic information researchVon Neumann entropy S(ρ) = -Tr(ρ log ρ), 1932Holevo bound — classical info from quantum stateQuantum mutual informationSchumacher source coding theorem (qubit term coined), 1995Entanglement entropy in many-body physicsQuantum channel capacities — Holevo-Schumacher-WestmorelandInformation TheoryBrian Tighe · Mind Maps
Orbital mind map. Scroll to zoom, drag to pan, or use the buttons above (+ / − / 0 keys also work). Hover a node to highlight its path to the center and the subtree beneath it.

How to read this

The center holds the topic. The six branches fan out bilaterally — three on each side — each in its own color. Sub-branches nest three levels deep under each top-level branch. Hover a leaf to trace the path back to the center; hover a branch to see everything it contains.

This is the shape the topic has when you try to hold the whole field in your head at once. It is not an argument; it is a scaffold. The essays argue against or within scaffolds like this one.

More in the series