skip to main content
Article

Model compression

Published: 20 August 2006 Publication History

Abstract

Often the best performing supervised learning models are ensembles of hundreds or thousands of base-level classifiers. Unfortunately, the space required to store this many classifiers, and the time required to execute them at run-time, prohibits their use in applications where test sets are large (e.g. Google), where storage space is at a premium (e.g. PDAs), and where computational power is limited (e.g. hea-ring aids). We present a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.

References

[1]
C. Blake and C. Merz. UCI repository of machine learning databases, 1998.
[2]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.
[3]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[4]
W. Buntine and R. Caruana. Introduction to IND and recursive partitioning. Technical Report FIA-91-28, NASA Ames Research Center, 10 1991.
[5]
R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In Proc. 21st International Conference on Machine Learning, 2004.
[6]
M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 24--30. The MIT Press, 1996.
[7]
A. P. Dempster, N. M. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1(39):1--38, 1977.
[8]
P. Domingos. Knowledge acquisition from examples via multiple models. In Proc. 14th International Conference on Machine Learning, pages 98--106. Morgan Kaufmann, 1997.
[9]
P. Domingos. Bayesian averaging of classifiers and the overfitting problem. In Proc. 17th International Conf. on Machine Learning, pages 223--230. Morgan Kaufmann, San Francisco, CA, 2000.
[10]
A. Gualtieri, S. R. Chettri, R. Cromp, and L. Johnson. Support vector machine classifiers as applied to aviris data. In Proc. Eighth JPL Airborne Geoscience Workshop, 1999.
[11]
T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods, 1999.
[12]
D. Loyd and P. Domingos. Naive Bayes models for probability estimation. In Proceedings of the 22nd International Conference on Machine Learning (ICML'05), Bonn, Germany, 2005.
[13]
P. Melville and R. Mooney. Constructing diverse classifier ensembles using artificial training examples. In Proceedings of the IJCAI-2003, pages 505--510, Acapulco, Mexico, 2003.
[14]
R. Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classification, 2001.
[15]
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[16]
I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999.
[17]
D. H. Wolpert. Stacked generalization. Neural Networks, 5:241--259, 1992.
[18]
A. Zell, N. Mache, R. Huebner, M. Schmalzl, T. Sommer, and T. Korb. SNNS: Stuttgart neural network simulator. Technical report, University of Stuttgart, Stuttgart, 1992.
[19]
X. Zeng and T. R. Martinez. Using a neural network to approximate an ensemble of classifiers. Neural Processing Letters, 12(3):225--237, 2000.

Cited By

View all
  • (2025)Ensemble Knowledge Distillation for Federated Semi-Supervised Image ClassificationTsinghua Science and Technology10.26599/TST.2023.901015630:1(112-123)Online publication date: Feb-2025
  • (2024)ERKT-Net: Implementing Efficient and Robust Knowledge Distillation for Remote Sensing Image ClassificationEAI Endorsed Transactions on Industrial Networks and Intelligent Systems10.4108/eetinis.v11i3.474811:3Online publication date: 3-Jul-2024
  • (2024)Can a Student Large Language Model Perform as Well as Its Teacher?Innovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch007(122-139)Online publication date: 12-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. model compression
  2. supervised learning

Qualifiers

  • Article

Conference

KDD06

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)789
  • Downloads (Last 6 weeks)74
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Ensemble Knowledge Distillation for Federated Semi-Supervised Image ClassificationTsinghua Science and Technology10.26599/TST.2023.901015630:1(112-123)Online publication date: Feb-2025
  • (2024)ERKT-Net: Implementing Efficient and Robust Knowledge Distillation for Remote Sensing Image ClassificationEAI Endorsed Transactions on Industrial Networks and Intelligent Systems10.4108/eetinis.v11i3.474811:3Online publication date: 3-Jul-2024
  • (2024)Can a Student Large Language Model Perform as Well as Its Teacher?Innovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch007(122-139)Online publication date: 12-Apr-2024
  • (2024)Policy Compression for Intelligent Continuous Control on Low-Power Edge DevicesSensors10.3390/s2415487624:15(4876)Online publication date: 27-Jul-2024
  • (2024)Machine Learning in Short-Reach Optical Systems: A Comprehensive SurveyPhotonics10.3390/photonics1107061311:7(613)Online publication date: 28-Jun-2024
  • (2024)A Pruning and Distillation Based Compression Method for Sonar Image Detection ModelsJournal of Marine Science and Engineering10.3390/jmse1206103312:6(1033)Online publication date: 20-Jun-2024
  • (2024)Edge Federated Optimization for Heterogeneous DataFuture Internet10.3390/fi1604014216:4(142)Online publication date: 22-Apr-2024
  • (2024)Multimodal Machine Translation Based on Enhanced Knowledge Distillation and Feature FusionElectronics10.3390/electronics1315308413:15(3084)Online publication date: 4-Aug-2024
  • (2024)Improving Training Dataset Balance with ChatGPT Prompt EngineeringElectronics10.3390/electronics1312225513:12(2255)Online publication date: 8-Jun-2024
  • (2024)Training Acceleration Method Based on Parameter FreezingElectronics10.3390/electronics1311214013:11(2140)Online publication date: 30-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media