This thesis explores three applications of information theory in machine learning, all involving the optimization of information flow in some learning problem. In Chapter 2, we introduce a method for extracting the most informative bits that one signal contains about another. Our method, the deterministic information bottleneck (DIB), is an alternative formulation of the information bottleneck (IB). In Chapter 3, we adapt the DIB to the problem of finding the most informative clusterings of geometric data. We also introduce an approach to model selection that naturally emerges within the (D)IB framework. In Chapter 4 we introduce an approach to encourage / discourage agents in a multi-agent reinforcement learning setting to share information with one another. We conclude in Chapter 5 by discussing ongoing and future work in these directions.