He also created a definition to calculate when perfect security for a cryptographic scheme had been achieved. (Most often demonstrated with the one-time pad).
The question is, for a designed steganographic channel, how much data can be passed through the channel while maintaining some equivalent of "perfect secrecy"? Many mathematicians have danced around this topic but only recently has it been addressed.
Ross Anderson and Fabian Petitcolas pointed out in their '96 and '98 papers some of the limits of steganography, notably that the party attempting to protect their communication is bound by their ability to model the channel. They make the claim that it's impossible to model the channel capacity, since you never know if your adversary has a better model of the channel then you do. (If they did, they could compress the data more effectively rendering the effective entropy zero and destroying either the channel or your ability to use it stealthily.) Other papers exist in the area discussing the topic in less detail or more narrowly that I have not included here as well.
The first paper that really broke ground in solving this problem was done by C Cachin in his 1998 workshop “An information-theoretic model for steganography,” and really rounded out in the follow up paper of the same name in 2004. Both papers are available at his site. Rather then focusing on what was hard, he showed how to calculate an equivalent for perfect secrecy given a channel model using a model-independent mathematical approach.
Pierre Moulin wrote some great papers with his student Ying Wang which are available here. His papers are useful to build upon Cachin's and establish an information theoretic general foundation. They also include bounded examples where it is impossible to detect the steganographic implemenation.
Building on the work by Cachin, Moulin, and other contributors, Harmsen and Pearlman have tied it all together with a draft published in 2008 and the final paper released in IEEE Transactions on Information Theory (updated in 2010 to point directly to their home page hosting it) providing a general model for capacity as a function of secrecy. Rather then trying to explain it myself, I'll include an excerpt for why their paper is so useful:
This work differs from previous work in a number ofFortunately we didn't listen to the naysayers and give up trying to model covert channel capacity.
aspects. Most notable is the use of information-spectrum methods that allow for the analysis of arbitrary detection algorithms and channels. This eliminates the need to restrict interest to detection algorithms that operate on sample averages or behave consistently. Instead, the detection functions may be instantaneous, meaning the properties of a detector for n samples need not have any relation to the same detector for n + 1 samples. Additionally, the typical restriction that the channel under consideration be consistent, ergodic or stationary is also lifted.
From now on, if anyone tell you they've built the solution to stop or detect all covert communications you can prove why they are wrong. It's all about capacity, as I've held for a long time. You can't stop covert communications, only limit capacity. You can limit a lot more if you use an active warden (randomize as much of the entropy as you can to make it difficult for the adversary to utilize the channel), but you cannot eliminate it. Case closed.
Update (February 22nd, 2011): Just found a paper extending the mathematical proof involving types of steganographic channels (active wardens, various statistical distributions, etc.) more comprehensively. It appears this will be massaged a few more times until every case is hammered down and people will move on to more complex scenarios. Nicholas Hopper, Luis von Ahn, and John Langford. "Provably Secure Steganography," IEEE Transactions on Computers 58(5): 662-676, May 2009.. (©2009 IEEE)