Article

Model compression

Authors:

Cristian Buciluǎ,

Alexandru Niculescu-MizilAuthors Info & Claims

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 535 - 541

https://doi.org/10.1145/1150402.1150464

Published: 20 August 2006 Publication History

Abstract

Often the best performing supervised learning models are ensembles of hundreds or thousands of base-level classifiers. Unfortunately, the space required to store this many classifiers, and the time required to execute them at run-time, prohibits their use in applications where test sets are large (e.g. Google), where storage space is at a premium (e.g. PDAs), and where computational power is limited (e.g. hea-ring aids). We present a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.

References

[1]

C. Blake and C. Merz. UCI repository of machine learning databases, 1998.

[2]

L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.

[3]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.

Digital Library

[4]

W. Buntine and R. Caruana. Introduction to IND and recursive partitioning. Technical Report FIA-91-28, NASA Ames Research Center, 10 1991.

[5]

R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In Proc. 21st International Conference on Machine Learning, 2004.

Digital Library

[6]

M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 24--30. The MIT Press, 1996.

[7]

A. P. Dempster, N. M. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1(39):1--38, 1977.

[8]

P. Domingos. Knowledge acquisition from examples via multiple models. In Proc. 14th International Conference on Machine Learning, pages 98--106. Morgan Kaufmann, 1997.

Digital Library

[9]

P. Domingos. Bayesian averaging of classifiers and the overfitting problem. In Proc. 17th International Conf. on Machine Learning, pages 223--230. Morgan Kaufmann, San Francisco, CA, 2000.

Digital Library

[10]

A. Gualtieri, S. R. Chettri, R. Cromp, and L. Johnson. Support vector machine classifiers as applied to aviris data. In Proc. Eighth JPL Airborne Geoscience Workshop, 1999.

[11]

T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods, 1999.

[12]

D. Loyd and P. Domingos. Naive Bayes models for probability estimation. In Proceedings of the 22nd International Conference on Machine Learning (ICML'05), Bonn, Germany, 2005.

Digital Library

[13]

P. Melville and R. Mooney. Constructing diverse classifier ensembles using artificial training examples. In Proceedings of the IJCAI-2003, pages 505--510, Acapulco, Mexico, 2003.

Digital Library

[14]

R. Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classification, 2001.

[15]

B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.

[16]

I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999.

Digital Library

[17]

D. H. Wolpert. Stacked generalization. Neural Networks, 5:241--259, 1992.

Digital Library

[18]

A. Zell, N. Mache, R. Huebner, M. Schmalzl, T. Sommer, and T. Korb. SNNS: Stuttgart neural network simulator. Technical report, University of Stuttgart, Stuttgart, 1992.

[19]

X. Zeng and T. R. Martinez. Using a neural network to approximate an ensemble of classifiers. Neural Processing Letters, 12(3):225--237, 2000.

Digital Library

Cited By

Shang ELiu HZhang JZhao RDu J(2025)Ensemble Knowledge Distillation for Federated Semi-Supervised Image ClassificationTsinghua Science and Technology10.26599/TST.2023.901015630:1(112-123)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2023.9010156
Song HLi YLi XZhang YZhu YZhou Y(2024)ERKT-Net: Implementing Efficient and Robust Knowledge Distillation for Remote Sensing Image ClassificationEAI Endorsed Transactions on Industrial Networks and Intelligent Systems10.4108/eetinis.v11i3.474811:3Online publication date: 3-Jul-2024
https://doi.org/10.4108/eetinis.v11i3.4748
Gholami SOmar M(2024)Can a Student Large Language Model Perform as Well as Its Teacher?Innovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch007(122-139)Online publication date: 12-Apr-2024
https://doi.org/10.4018/979-8-3693-1906-2.ch007
Show More Cited By

Index Terms

Model compression
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Ensemble of Efficient Minimal Learning Machines for Classification and Regression

Minimal Learning Machine (MLM) is a recently proposed supervised learning algorithm with performance comparable to most state-of-the-art machine learning methods. In this work, we propose ensemble methods for classification and regression using MLMs. ...
D3MC: A Reinforcement Learning Based Data-Driven Dyna Model Compression
Large-Scale Annotation of Biomedical Data and Expert Label Synthesis and Hardware Aware Learning for Medical Imaging and Computer Assisted Intervention
Abstract
Artificial intelligence (AI)-driven medical devices have created a new excitement in healthcare sector. While deeper and wider neural networks are designed for complex healthcare applications, model compression can be an effective way to deploy ...
A multi-class boosting method with direct optimization
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

We present a direct multi-class boosting (DMCBoost) method for classification with the following properties: (i) instead of reducing the multi-class classification task to a set of binary classification tasks, DMCBoost directly solves the multi-class ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2006

986 pages

ISBN:1595933395

DOI:10.1145/1150402

Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD06

Sponsor:

KDD06: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 20 - 23, 2006

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,200
Total Citations
View Citations
5,487
Total Downloads

Downloads (Last 12 months)789
Downloads (Last 6 weeks)74

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shang ELiu HZhang JZhao RDu J(2025)Ensemble Knowledge Distillation for Federated Semi-Supervised Image ClassificationTsinghua Science and Technology10.26599/TST.2023.901015630:1(112-123)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2023.9010156
Song HLi YLi XZhang YZhu YZhou Y(2024)ERKT-Net: Implementing Efficient and Robust Knowledge Distillation for Remote Sensing Image ClassificationEAI Endorsed Transactions on Industrial Networks and Intelligent Systems10.4108/eetinis.v11i3.474811:3Online publication date: 3-Jul-2024
https://doi.org/10.4108/eetinis.v11i3.4748
Gholami SOmar M(2024)Can a Student Large Language Model Perform as Well as Its Teacher?Innovations, Securities, and Case Studies Across Healthcare, Business, and Technology10.4018/979-8-3693-1906-2.ch007(122-139)Online publication date: 12-Apr-2024
https://doi.org/10.4018/979-8-3693-1906-2.ch007
Avé TDe Schepper TMets K(2024)Policy Compression for Intelligent Continuous Control on Low-Power Edge DevicesSensors10.3390/s2415487624:15(4876)Online publication date: 27-Jul-2024
https://doi.org/10.3390/s24154876
Shao CGiacoumidis EBillah SLi SLi JSahu PRichter AFaerber MKaefer T(2024)Machine Learning in Short-Reach Optical Systems: A Comprehensive SurveyPhotonics10.3390/photonics1107061311:7(613)Online publication date: 28-Jun-2024
https://doi.org/10.3390/photonics11070613
Cheng CHou XWang CWen XLiu WZhang F(2024)A Pruning and Distillation Based Compression Method for Sonar Image Detection ModelsJournal of Marine Science and Engineering10.3390/jmse1206103312:6(1033)Online publication date: 20-Jun-2024
https://doi.org/10.3390/jmse12061033
Lin HWen C(2024)Edge Federated Optimization for Heterogeneous DataFuture Internet10.3390/fi1604014216:4(142)Online publication date: 22-Apr-2024
https://doi.org/10.3390/fi16040142
Tian EZhu ZLiu FLi ZGu RZhao S(2024)Multimodal Machine Translation Based on Enhanced Knowledge Distillation and Feature FusionElectronics10.3390/electronics1315308413:15(3084)Online publication date: 4-Aug-2024
https://doi.org/10.3390/electronics13153084
Kochanek MCichecki IKaszyca OSzydło DMadej MJędrzejewski DKazienko PKocoń J(2024)Improving Training Dataset Balance with ChatGPT Prompt EngineeringElectronics10.3390/electronics1312225513:12(2255)Online publication date: 8-Jun-2024
https://doi.org/10.3390/electronics13122255
Tang HChen JZhang WGuo Z(2024)Training Acceleration Method Based on Parameter FreezingElectronics10.3390/electronics1311214013:11(2140)Online publication date: 30-May-2024
https://doi.org/10.3390/electronics13112140
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents