David Heckerman

Distinguished Scientist, Amazon

E-mail: heckerma@hotmail.com

Research activities

I am developing machine learning and statistical approaches for a variety of applications including genomics and vaccine design.  In my early work, I demonstrated the importance of probability theory in Artificial Intelligence, developed methods to build what are now called AI chatbots, and developed methods to learn graphical models from data including methods for causal discovery.


While at Microsoft, I developed numerous applications including machine-learning tools in SQL Server and Commerce Server, the junk-mail filters in Outlook, Exchange, and Hotmail, handwriting recognition in the Tablet PC, text mining software in Sharepoint Portal Server, troubleshooters in Windows, and the Answer Wizard in Office.

Selected publications

D. Heckerman.  Probabilistic interpretations for MYCIN's certainty factors.  In Proceedings of the Workshop on Uncertainty and Probability in Artificial Intelligence, Los Angeles, CA, pages 9-20. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1985.  Also in L. Kanal. and J. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 167-196. North-Holland, New York, 1986.


D. Heckerman.  Probabilistic Similarity Networks.  MIT Press, Cambridge, MA, 1991.


D. Heckerman, R. Shachter. Decision-Theoretic Foundations for Causal Reasoning.  Journal of Artificial Intelligence Research, 3:405-430, 1995. 


J. Breese and D. Heckerman. Decision-theoretic troubleshooting: A framework for repair and experiment.  In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pages 124-132. Morgan Kaufmann, August 1996.


D. Geiger, D. Heckerman. A Characterization of the Dirichlet Distribution Through Global and Local Independence.  The Annals of Statistics, 25:1344-1369, 1997. 


D. Heckerman and E. Horvitz. Inferring Informational Goals from Free-Text Queries. In Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann, July 1998.


D. Heckerman.  A Tutorial on Learning with Bayesian Networks.  In Learning in Graphical Models, M. Jordan, ed.. MIT Press, Cambridge, MA, 1999.


D. Heckerman, C. Meek, and G. Cooper A Bayesian Approach to Causal Discovery.  In C. Glymour and G. Cooper, editors, Computation, Causation, and Discovery, pages 141-165.  MIT Press, Cambridge, MA, 1999.


J. Breese, D. Heckerman, C. Kadie Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann, July 1998. May, 1998.


S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. Inductive Learning Algorithms and Representations for Text Categorization.  Proceedings of ACM-CIKM98, November, 1998.  Winner of CIKM Test of Time Award 2017.


D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, C. Kadie. Dependency Networks for Density Estimation, Collaborative Filtering, and Data VisualizationJournal of Machine Learning Research. 1:49-75, 2000. 


D. Geiger and D. Heckerman. Parameter Priors for Directed Acyclic Graphical Models and the Characterization of Several Probability Distributions.  The Annals of Statistics, 30: 1412-1440, 2002. 


I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White, Visualization of Navigation Patterns on a Web Site Using Model Based Clustering, Data Mining and Knowledge Discovery, 7:399-424, 2003


J. Goodman, D. Heckerman, and R. Rounthwaite.  Stopping Spam.  Scientific American, April, 2005.  Microsoft copy.


J. Carlson, Z. Brumme, C. Rousseau, C. Brumme, P. Matthews, C. Kadie, J. Mullins, B. Walker, P. Harrigan, P. Goulder, D. Heckerman.  Phylogenetic dependency networks: Inferring patterns of CTL escape and codon covariation in HIV-1 Gag. PLoS Computational Biology, 4(11): e1000225, November 2008.


H. Kang, N. Zaitlen, C. Wade, A. Kirby, D. Heckerman, M. Daly, and E. Eskin, Efficient Control of Population Structure in Model Organism Association Mapping, Genetics, 178:1709-1723, March, 2008 (doi: 10.1534/genetics.107.080101).


C. Lippert, J. Listgarten, Y. Liu, C.M. Kadie, R.I. Davidson, and D. Heckerman.  FaST linear mixed models for genome-wide association studies.  Nature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681).  Preprint.


F. Pereyra, D. Heckerman, J. Carlson, C. Kadie, D. Soghoian, D. Karel, A. Goldenthal, O. Davis, C. DeZiel, T. Lin, J. Peng, A. Piechocka, M. Carrington, and B. Walker. HIV Control Is Mediated in Part by CD8+ T-Cell Targeting of Specific Epitopes. J. Virol 88 12937-12948, Aug 2014.


R. Rubsamen, C. Herst, P. Lloyd, D. Heckerman. Eliciting cytotoxic T-lymphocyte responses from synthetic vectors containing one or two epitopes in a C57BL/6 mouse model using peptide-containing biodegradable microspheres and adjuvants. Vaccine 32, 4111-4116, June 2014.


C. Widmer, C. Lippert, O. Weissbrod, N. Fusi, C.M. Kadie, R.I. Davidson, J. Listgarten, and D. Heckerman. Further Improvements to Linear Mixed Models for Genome-Wide Association Studies. Scientific Reports 4, 6874, Nov 2014 (doi:10.1038/srep06874).


O. Weissbrod, C. Lippert, D. Geiger, and D. Heckerman.  Accurate liability estimation improves power in ascertained case-control studies.  Nature Methods, Feb 2015 (doi:10.1038/nmeth.3285).  Preprint


D. Heckerman, D. Gurdasani, C. Kadie, C. Pomilla, T. Carstensen, H. Martin, K. Ekoru, R.N. Nsubuga, G. Ssenyomo A. Kamali, P. Kaleebu, C. Widmer, and M.S. Sandhu.  Linear mixed model for heritability estimation that explicitly addresses environmental variation.  PNAS, 113: 7377–7382, July 2016 (doi: 10.1073/pnas.1510497113).


G.M. Souza, M.A. Van Sluys, C.G. Lembke, H. Lee, G.R.A. Margarido, C.T. Hotta, J.W. Gaiarsa, A.L. Diniz, M. de Medeiros Oliveira, S. de Siqueira Ferreira, M.Y. Nishiyama Jr, F. ten-Caten, G.T. Ragagnin, P. de Morais Andrade, R.F. de Souza, G.G. Nicastro, R. Pandya, C. Kim, H. Guo, A.M. Durham, M. S. Carneiro, J. Zhang, X. Zhang, Q. Zhang, R. Ming, M.C.Schatz, R. Davidson, A.H. Paterson, and D. Heckerman.  Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world's leading biomass crop.  GigaScience, 8(12), Dec 2019 (doi.org/10.1093/gigascience/giz129).



All publications with annotations and links to talks

Publications by category (a bit out of date)

·         Genomics

·         FaST-LMM and other mixed models

·         Computational biology

·         HIV and HCV vaccine design

·         Machine learning

·         Graphical models

·         Causality and causal inference

·         Artificial intelligence

·         Probability

·         Spam filtering

·         Collaborative filtering

·         Visualization

·         Education

·         Physics

·         Abstracts of early papers


Last Updated: 7/4/2021
More about the Heckermans here.