Data sets for multi-instance learning:
Musk data: benchmark test data for multi-instance learning [Refer: Dietterich T G, Lathrop R H, Lozano-Pérez, Solving the multiple-instance problem with axis-parallel rectangles, Artificial Intelligence, 1997, 89(1-2): 31-71.]
Artificial multi-instance data: benchmark test data for multi-instance regression [Refer: Amar R A, Dooly D R, Goldman S A, Zhang Q. Multiple-instance learning of real-valued data. In: Proceedings of the 18th International Conference on Machine Learning (ICML'01), Williamstown, MA, 2001, 3-10.]
Data for MIL based web index recommendation: data set used for MIL based web index recommendation [Refer: Z.-H. Zhou, K. Jiang, and M. Li. Multi-instance learning based web mining. Applied Intelligence, 22(2): 135-147, 2005.]
Data sets for multi-label learning:
Yeast data: test data for multi-label learning given by A. Elisseeff and J. Weston [Refer: Elisseeff A, Weston J. A kernel method for multi-labelled classification. In: Advances in Neural Processing Systems 14, Cambridge, MA: MIT Press, 2002, 681-687.]
Image data: test data for multi-label learning given by M.-L. Zhang and Z.-H. Zhou [Refer: M.-L. Zhang, Z.-H. Zhou. ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038-2048.]
Web page data: test data for multi-label learning given by N. Ueda and K. Saito [Refer: Ueda N, Saito K. Parametric mixture models for multi-label text. In: Advances in Neural Processing Systems 15, Cambridge, MA: MIT Press, 2003, 721-728.]
Reuters corpus: benchmark test data for multi-label text categorization
More multi-label data sets are available at the Mulan Library and the Sourceforge Network
Data sets for partial label learning:
Notice: The
following partial label learning data sets were collected and pre-processed
by me, with courtesy and proprietary to the authors of referred literatures
on them. The pre-processed data sets can be used at your own risk and for
academic purpose only.
After unzipping and loading each data set in the Matlab environment, you can
find three variables named "data", "partial_target" and "target"
organized in the following way:
"data": an Mxd matrix w.r.t. the feature
representations, where M is the number of instances and d is
the number of features. Here, data(i,:) stores the feature vector of
the ith instance.
"partial_target": a QxM matrix w.r.t. the candidate
labeling information, where Q is the number of possible class labels.
Here, partial_target(j,i)=1 if the jth class label is
among the candidate label set of the ith instance; Otherwise,
partial_target(j,i)=0.
"target": a QxM matrix w.r.t. the ground-truth labeling
information. Here, target(j,i)=1 if the jth class label
is the ground-truth label of the ith instance; Otherwise, target(j,i)=0.
FG-NET data: facial age estimation from crowd-sourced annotations [Refer: G. Panis, A. Lanitis. An overview of research activities in facial age estimation using the FG-NET aging database. Lecture Notes in Computer Science 8926, Berlin: Springer, 2015, 737-750.] (1.98Mb)
Lost data: automatic face naming from videos [Refer: T. Cour, B. Sapp, B. Taskar. Learning from partial labels. Journal of Machine Learning Research, 12(May): 1501–1536, 2011.] (914Kb)
MSRCv2 data: object classification [Refer: L. Liu, T. Dietterich. A conditional multinomial mixture model for superset label learning. In: Advances in Neural Information Processing Systems 25, Cambridge, MA: MIT Press, 2012, 557–565.] (373Kb)
BirdSong data: bird song classification [Refer: F. Briggs, X. Z. Fern, R. Raich. Rank-loss support instance machines for MIML instance annotation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012, 534–542.] (1.00Mb)
Soccer Player data: automatic face naming from images [Refer: Z. Zeng, S. Xiao, K. Jia, T.-H. Chan, S. Gao, D. Xu, Y. Ma. Learning by associating ambiguously labeled images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, 2013, 708–715.] (35.18Mb)
Yahoo! News: automatic face naming from images [Refer: M. Guillaumin, J. Verbeek, C. Schmid. Multiple instance metric learning from automatically labeled bags of faces. In: Lecture Notes in Computer Science 6311, Berlin: Springer, 2010, 634–647.] (28.04Mb)
Mirflickr data: web image classification [Refer: M. J. Huiskes, M. S. Lew. The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, 2008, 39–43.] (30.40Mb)
Data sets for partial multi-label learning:
Notice:
The following partial multi-label learning data sets were collected and
pre-processed by me, with courtesy and proprietary to the authors of
referred literatures on them. The pre-processed data sets can be used at
your own risk and for academic purpose only.
In the zipped file for each dataset, there are two files named
"DATANAME.mat" and "DATANAME.txt". "DATANAME.mat" corresponds to the data
file and "DATANAME.txt" includes a brief description of the data set. After
loading "DATANAME.mat" in the Matlab environment, you can find three
variables named "data",
"candidate_labels"
and "target"
organized in the following way:
"data":
an Mxd matrix
w.r.t. the feature representations, where M is
the number of instances and d is
the number of features. Here, data(i,:)
stores the feature vector of the ith
instance.
"candidate_labels":
a QxM matrix
w.r.t. the candidate labeling information, where Q is
the number of possible class labels. Here, candidate_labels(j,i)=1
if the jth
class label is among the candidate label set of the ith
instance; Otherwise, candidate_labels(j,i)=0.
"target":
a QxM matrix
w.r.t. the ground-truth labeling information. Here, target(j,i)=1
if the jth
class label is the ground-truth label of the ith
instance; Otherwise, target(j,i)=0.
Music_emotion data: music classification from the emotion perspective [Refer: M.-L. Zhang and J.-P. Fang. Partial multi-label learning via credible label elicitation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3587–3599.] (4.79Mb)
Music_style data: music classification from the style perspective [Refer: M.-L. Zhang and J.-P. Fang. Partial multi-label learning via credible label elicitation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3587–3599.] (4.79Mb)
Mirflickr data: image retrieval task [Refer: M. J. Huiskes, M. S. Lew. The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, 2008, 39–43.] (7.67Mb)
YeastBP data, YeastCC data, YeastMF data: protein-protein interaction [Refer: G. Yu, X. Chen, C. Domeniconi, J. Wang, Z. Li, Z. Zhang, and X. Wu. Feature-induced partial multi-label learning. In: Proceedings of the 18th IEEE International Conference on Data Mining, Singapore, 2018, 1398–1403.] (1.09Mb, 1.02Mb, 1.02Mb)
Data sets for multi-instance partial-label (MIPL) learning:
Notice:
The following multi-instance
partial-label learning data sets were collected and pre-processed by me,
with courtesy and proprietary to the authors of referred literatures on
them. The pre-processed data sets can be used at your own risk and for
academic purpose only.
For the benchmark datasets (MNIST-MIPL, FMNIST-MIPL, Newsgroups-MIPL,
Birdsong-MIPL, SIVAL-MIPL), each zipped file consists of three files and one
folder, including "DATANAME_r1.mat", "DATANAME_r2.mat", "DATANAME_r3.mat",
and the folder called "index".
For the CRC-MIPL-Row, CRC-MIPL-SBN,
CRC-MIPL-KMeansSeg, and CRC-MIPL-SIFT
datasets, each zipped file contains a "DATANAME.mat" and a folder called
"index".
After unzipping and loading each data set and its index in the Matlab
environment, you can find three variable named "data",
"trainIndex",
and "testIndex"
organized in the following way:
"data":
an
Mx4
cell array for the benckmark datasets or an
Mx3
cell array for the CRC-MIPL
datasets, where M is
the number of multi-instance bags. Here, data(i,1)
stores the feature vector of the ith
bag and can be converted to a Nixd matrix,
where Ni is
the number of instances in the ith
bag and d is
the number of features per instance. data(i,2)
and data(i,3)
are the candidate label set and ground-truth label of the ith
bag, respectively. And data(i,4)
is only for the benchmark datasets, which reserves the ground-truth labels of the instances in the ith
bag.
"trainIndex":
a 1xTr matrix,
where Tr is
the number of training bags.
"testIndex":
a 1xTe matrix,
where Te is
the number of test bags.
The folder "index" contains ten partitions of 50%/50% or 70%/30% random train/test,
which are for reference only. Please feel free to use your own partitions
for experimental evaluation.
Data Set | #bags | #instances | max. #instances | min. #instances | avg. #instances | # dimensions | # classes | avg. #CLs |
---|---|---|---|---|---|---|---|---|
MNIST-MIPL (33MB) | 500 | 20664 | 48 | 35 | 41.33 | 784 | 5 | 2, 3, 4 |
FMNIST-MIPL (85MB) | 500 | 20810 | 48 | 36 | 41.62 | 784 | 5 | 2, 3, 4 |
Newsgroups-MIPL (3.16MB) | 1000 | 43122 | 86 | 11 | 43.12 | 200 | 10 | 2, 3, 4 |
Birdsong-MIPL (38MB) | 1300 | 48425 | 76 | 25 | 37.25 | 38 | 13 | 2, 3, 4 |
SIVAL-MIPL (29.7MB) | 1500 | 47414 | 32 | 31 | 31.61 | 30 | 25 | 2, 3, 4 |
CRC-MIPL-Row (3.3MB) | 7000 | 56000 | 8 | 8 | 8.00 | 9 | 7 | 2.08 |
CRC-MIPL-SBN (2.0MB) | 7000 | 63000 | 9 | 9 | 9.00 | 15 | 7 | 2.08 |
CRC-MIPL-KMeansSeg (1.4MB) | 7000 | 30178 | 8 | 3 | 4.31 | 6 | 7 | 2.08 |
CRC-MIPL-SIFT (17.3MB) | 7000 | 175000 | 25 | 25 | 25 | 128 | 7 | 2.08 |
NOTE: #bags, #instances, max. #instances, min. #instances, avg. #instances, #dimensions, #classes, and avg. #CLs denote the number of bags, number of instances, maximum number of instances in a bag, minimum number of instances in a bag, average number of instances in all bags, dimension of each instance, number of targeted class labels, average size of candidate label set in each dataset.
More details can be found in the references [W. Tang, W. Zhang, M.-L. Zhang.
Multi-instance partial-label learning:
Towards exploiting dual inexact supervision. Science China
Information Sciences, in press.] and [W. Tang, W. Zhang, M.-L.
Zhang. Disambiguated attention embedding
for multi-instance partial-label learning. In: Advances in Neural
Information Processing Systems 36 (NeurIPS'23), New Orleans, LA, 2023,
in press.]
Data sets for multi-dimensional classification:
Notice: The
following multi-dimensional classification data sets were collected and pre-processed
by me, with courtesy and proprietary to the authors of referred literatures
on them. The pre-processed data sets can be used at your own risk and for
academic purpose only.
In the zipped file for each dataset, there are a total of four files, including "DATANAME.mat", "DATANAME.txt", "dataset_statistics.m", and "demo.m". Here, "DATANAME.mat" corresponds to the data file, "DATANAME.txt" includes a detailed description of the data set, running "dataset_statistics.m" in Matlab can output the characteristics of the data set, and "demo.m" is a demo of MDC baselines binary relevance (BR) & class powerset (CP) with base classifiers SVM and decision tree.
After unzipping and loading "DATANAME.mat" in the Matlab environment, you can
find five variables named "data", "data_type", "target", "data_name" and "idx_folds"
organized in the following way:
"data": A struct w.r.t. the input attribute representations, where
-data.orig is an mxd matrix and stores the original input attribute representations, where m is the number of instances and d is the number of features. Here, data.orig(i,:) stores the feature vector of the ith instance. If your learning algorithm is sensitive to the type of input attributes like decision tree, naive Bayes classifier, etc., then you should use data.orig;
-data.norm is an mxd' matrix and stores the preprocessed version of data.orig where discrete-valued attributes are transformed into their one-hot form and continuous-valued attributes are normalized into [0,1]. If your learning algorithm can only accept continuous-valued input attributes like support vector machine, logistic regression, etc., then you should use data.norm;
NOTE: If all input attributes are continuous-valued, then data.orig is empty and data.norm is a [0,1]-normalized matrix.
"data_type": A struct w.r.t. the input attributes where
-data_type.d_wo_o stores the indexes of all input attributes whose type is discrete-valued without ordinal relationship (a.k.a. categorical/nominal);
-data_type.d_w_o stores the indexes of all input attributes whose type is discrete-valued with ordinal relationship;
-data_type.b stores the indexes of all input attributes whose type is binary-valued;
-data_type.c stores the indexes of all input attributes whose type is continuous-valued (a.k.a. numeric).
NOTE: The corresponding field is empty when no such type of input attributes exist.
"target": An mxq matrix w.r.t. the labeling information, where q is the number of dimensions. Here, target(i,:) stores the class vector associated with the ith instance.
"data_name": A string which stores the name of this data set.
"idx_folds": A 10x1 cell w.r.t. the data partition in ten-fold cross validation, where
-idx_folds{i}.train stores the indexes of training examples in the ith cross validation,
-idx_folds{i}.test stores the indexes of testing examples in the ith cross validation.
NOTE: These ten-fold cross validation partitions are only given for reference purpose. Please feel free to use your own partitions for experimental evaluation.
Dataset |
# Examples | # Dimensions | # Labels/Dimension | # Features |
---|---|---|---|---|
Edm (8KB) | 154 | 2 | 3 | 16n |
Flare1 (5KB) | 323 | 3 | 3,4,2 | 10x |
Oes97 (255KB) | 334 | 16 | 3 | 263n |
Jura (22KB) | 359 | 2 | 4,5 | 9n |
Oes10 (325KB) | 403 | 16 | 3 | 298n |
Enb (14KB) | 768 | 2 | 2,4 | 6n |
Song (567KB) | 785 | 3 | 3 | 98n |
WQplants (49KB) | 1060 | 7 | 4 | 16n |
WQanimals (49KB) | 1060 | 7 | 4 | 16n |
WaterQuality (51KB) | 1060 | 14 | 4 | 16n |
BeLaE (94KB) | 1930 | 5 | 5 | 1n,44x |
Voice (409KB) | 3136 | 2 | 4,2 | 19n |
Scm20d (2.57MB) | 8966 | 16 | 4 | 61n |
Rf1 (1.07MB) | 8987 | 8 | 4,4,3,4,4,3,4,3 | 64n |
Thyroid (445KB) | 9172 | 7 | 5,5,3,2,4,4,3 | 7n,22x |
Pain (9.53MB) | 9734 | 10 | 2,5,4,2,2,5,2,5,2,2 | 136n |
Scm1d (15.36MB) | 9803 | 16 | 4 | 280n |
CoIL2000 (715KB) | 9822 | 5 | 6,10,10,4,2 | 81x |
TIC2000 (729KB) | 9822 | 3 | 6,4,2 | 83x |
Flickr (135MB) | 12198 | 5 | 3,4,3,4,4 | 1536n |
Disfa (12.86MB) | 13095 | 12 | 5,5,6,3,4,4,5,4,4,4,6,4 | 136n |
Fera (13.82MB) | 14052 | 5 | 6 | 136n |
Adult (852KB) | 18419 | 4 | 7,7,5,2 | 5n,5x |
Default (3.54MB) | 28779 | 4 | 2,7,4,2 | 14n,6x |
NOTE1: If the number of class labels in each class space is identical, then only this number is recorded; Otherwise, the number of class labels in each class space is recorded in turn.
NOTE2: In the last column, n and x denote numeric and nominal features, respectively. Here, we refer to all the three non-numeric types of features (i.e., discrete-valued without/with ordinal relationship and binary-valuded) as nominal features.
[go top]
ATTENTION:
The following packages were developed by me, you can feel
free to use the package (for academic purpose only). To run those programs,
the Matlab environment is required.
For helps on using the main functions (e.g.
main_func.m) of each package, please type "help main_func" in Matlab prompt.
Any problem concerning the code, please
feel free to contact me.
Codes for multi-instance learning:
MIL learners and their ensemble versions
Description: This toolbox contains programs for four
different multi-instance learners, i.e. Diverse Density,
Citation-kNN,
Iterated-discrim APR,
and EM-DD. Ensemble
versions of these individual MIL learners are also included in the package.
There is a ReadMe file roughly explaining the codes.
Reference: Z.-H. Zhou, M.-L. Zhang.
Ensembles
of multi-instance learners. In: Proceedings of the 14th European
Conference on Machine Learning (ECML'03), Cavtat-Dubrovnik, Croatia,
LNAI 2837, 2003, pp.492-502.
Download: [code]
(3.86Mb)
RBF neural networks for MIL
Description: This toolbox contains programs for the
multi-instance learner adapted from traditional RBF neural networks.
Reference:
M.-L. Zhang, Z.-H. Zhou.
Adapting RBF neural networks to multi-instance learning.
Neural
Processing Letters, 2006,
23(1): 1-26.
Download: [code]
(4 Kb)
BP neural networks for MIL
Description: This toolbox contains programs for the
multi-instance learner adapted from traditional BP neural networks.
Reference:
Z.-H. Zhou, M.-L. Zhang.
Neural networks for multi-instance learning.
Technical Report,
AI Lab, Computer Science & Technology Department, Nanjing University,
Nanjing, China, Aug. 2002.
Download: [code]
(3 Kb)
Constructive clustering ensemble for MIL
Description: This toolbox contains programs for the
multi-instance learner based on constructive clustering ensemble
Reference:
Z.-H. Zhou, M.-L. Zhang.
Solving multi-instance problems with classifier ensemble based on
constructive clustering.
Knowledge and
Information Systems, 2007, 11(2): 155-170.
Download: [code]
(1.47Mb)
[go top]
Codes for multi-label learning:
Multi-label lazy learning approach
Description: This toolbox contains programs for the
multi-label lazy learner adapted from traditional k-nearest neighbor
algorithm.
Reference:
Zhang
M-L,
Zhou Z-H.
ML-kNN: A lazy learning approach to multi-label learning. Pattern
Recognition, 2007, 40(7): 2038-2048.
Download: [code]
(1.28 Mb)
Multi-label support vector machines
Description: This toolbox contains programs for the
multi-label kernel learner proposed by
A.
Elisseeff and
J. Weston.
Reference: A. Elisseeff and J. Weston.
A kernel method for
multi-labelled classification. In T. G. Dietterich, S. Becker, and Z.
Ghahramani, editors, Advances in Neural Information Processing Systems 14,
pages 681-687. MIT Press, Cambridge: MA, 2002.
Download: [code]
(8 Kb)
Multi-label BP neural networks
Description: This toolbox contains programs for the
multi-label neural networks adopted from the traditional BP neural networks.
Reference:
M.-L. Zhang,
Z.-H. Zhou.
Multi-label neural networks with applications to functional genomics and
text categorization.
IEEE Transactions on Knowledge and Data Engineering,
2006, 18(10): 1338-1351.
Requirement: The "Neural Network Toolbox" of Matlab must be
available.
Download: [code]
(582 Kb)
Multi-label RBF neural networks
Description: This toolbox contains programs for the
multi-label neural networks adopted from the traditional RBF neural networks.
Reference:
M.-L. Zhang. ML-RBF: RBF neural networks for multi-label learning. Neural
Processing Letters,
2009, 29(2): 61-74.
Requirement: The "Statistics Toolbox" of Matlab must be
available.
Download: [code]
(1.28 Mb)
Multi-label naive bayes
classifier (with feature selection)
Description: This toolbox contains programs for the
multi-label naive bayes classifier (with feature selection).
Reference:
M.-L. Zhang, J. M. Peña, V. Robles.
Feature selection for multi-label naive bayes
classification. Information Sciences, 2009, 179(19): 3218-3229.
Requirement: The "Genetic Algorithm and Direct Search Toolbox" of
Matlab must be
available.
Download: [code]
(1.28 Mb)
Multi-label
classifier by incorporating Bayesian network structure
Description: This toolbox contains programs for the
multi-label classifier which explicitly exploits label dependency with
Bayesian network structure.
Reference:
M.-L. Zhang, K. Zhang.
Multi-label learning by exploiting label
dependency. In: Proceedings of the 16th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining, Washington D. C., 2010,
999-1007.
Requirement: The matlab package for
Libsvm
should be used in conjunction with this toolbox.
Download: [code]
(1.35 Mb)
Multi-label
classifier with label-specific features
Description: This toolbox contains programs for the
multi-label classifier which utilizes label-specific features.
Reference:
M.-L. Zhang.
LIFT: Multi-label learning with label-specific
features. In: Proceedings of the 22nd International Joint Conference
on Artificial Intelligence (IJCAI'11),
Barcelona, Spain, 2011,
1609-1614.
Requirement: The matlab package for
Libsvm
should be used in conjunction with this toolbox.
Download: [code]
(1.39 Mb)
Multi-label class-imbalance learning
Description: The package includes the Java code of
COCOA, which is designed for learning from multi-label data by addressing
the class-imbalance problem. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang,
Y.-K. Li, H. Yang, X.-Y. Liu. Towards class-imbalance aware multi-label learning.
IEEE Transactions on Cybernetics, in press.
Download: [code] (6.25 Mb)
Multi-label classifier by exploiting implicit RLI information
Description: The package includes the Java code of
RELIAB, which is designed for learning from multi-label data by exploiting
the implicit relative labeling-importance (RLI) information. Source
code as well as running demo are included in the package.
Reference:
M.-L. Zhang, Q.-W.
Zhang, J.-P. Fang, Y.-K. Li, X. Geng. Leveraging
implicit relative labeling-importance information for effective multi-label
learning. IEEE Transactions on Knowledge and Data Engineering,
2021, 33(5): 2057-2070.
Download: [code] (22.4 Mb)
Inductive
semi-supervised multi-label learning with co-training
Description: The package includes the Matlab code of
COINS, which is designed for learning from multi-label data under the
inductive semi-supervised setting by adapting the co-training techniques. Source
code as well as running demo are included in the package.
Reference:
W. Zhan, M.-L.
Zhang. In: Proceedings of the 23rd ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD'17), Halifax,
Canda, 2017, 1305-1314.
Download: [code] (2.20 Mb)
Multi-label learning
with feature-induced labeling information enrichment
Description: The package includes the Matlab code of
MLFE, which learns from multi-label data with labeling information enriched
by feature-induced manipulation. Source
code as well as running demo are included in the package.
Reference:
Q.-W. Zhang,
Y. Zhong, M.-L. Zhang. Feature-induced
labeling information enrichment for multi-label learning. In:
Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18),
New Orleans, TX, 4446-4453.
Download: [code] (273
Kb)
Multi-view
multi-label learning with view-specific information extraction
Description: The package includes the Python code of
SIMM, which learns from multi-label data with multiple views based on
view-specific information extraction. Source
code as well as running demo are included in the package.
Reference:
X. Wu, Q.-G. Chen, Y. Hu, D.
Wang, X. Chang, X. Wang, M.-L. Zhang.
Multi-view multi-label learning with view-specific information extraction.
In: Proceedings of the
28th International Joint Conference on Artificial Intelligence
(IJCAI'19), Macau, China, 2019, 3884-3890.
Download: [code] (1.84
Mb)
Multi-label
classification with compositional metric learning
Description: The package includes the Matlab code of
COMMU, which performs multi-label classificaiton with compositional metric
learning. Source
code as well as running demo are included in the package.
Reference:
Y.-P. Sun, M.-L. Zhang.
Multi-label classification with compositional metric learning.
Frontiers of Computer Science, 2021, 15(5): Article 155320.
Download: [code]
(1.73
Mb)
BiLable-specific
features for multi-label classification
Description: The package includes the Matlab code of
BiLAS, which performs multi-label classificaiton with bilabel specific
features. Source
code as well as running demo are included in the package.
Reference:
M.-L. Zhang, J.-P.
Fang, Y-B. Wang.
BiLabel-specific features for multi-label classification.
ACM Transactions on Knowledge Discovery from Data, 2021, 16(1):
Article 18.
Download: [code]
(1.13
Mb)
Wrapped
label-specific features for multi-label classification
Description: The package includes the Python code of
WRAP, which performs multi-label classificaiton with label-specific features
in wrapped mode. Source
code as well as running demo are included in the package.
Reference:
Z.-B. Yu, M.-L. Zhang.
Multi-label classification with label-specific
feature generation: A wrapped approach. IEEE Transactions on Pattern Analysis and Machine
Intelligence, in press.
Download: [code]
(708 Kb)
Deep label-specific features for multi-label
classification
Description: The package includes the Python code of
CLIF, which performs multi-label classificaiton with collaborative learning
of label semantics and deep label-specific features. Source
code as well as running demo are included in the package.
Reference:
J.-Y. Hang,
M.-L. Zhang. Collaborative learning of label semantics and deep
label-specific features for multi-label classification. IEEE Transactions on Pattern Analysis and Machine
Intelligence, in press.
Download: [code]
(1.85 Mb)
Correlation-guided representation for multi-label text classification
Description: The package includes the source code of
CORE, which solves the multi-label text classification problem via
correlation-guided representation. A Readme file is included in the package.
Reference:
Q.-W. Zhang, X. Zhang, Z. Yan, R. Liu, Y. Cao, M.-L. Zhang.
Correlation-guided representation for
multi-label text classification. In: Proceedings of the
30th International Joint Conference on Artificial Intelligence (IJCAI'21),
Virtual Conference, 2021, 3363-3369.
Download: [code]
(144Kb)
End-to-end
probabilistic label-specific feature learning for multi-label classification
Description: The package includes the source code of
PACA, which solves the multi-label text classification problem via
end-to-end probabilistic label-specific feature learning. A Readme file is included in the package.
Reference:
J.-Y. Hang, M.-L. Zhang, Y. Feng, X.
Song. End-to-end probabilistic label-specific feature learning for
multi-label classification. In:
Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22),
Vancouver, Canada, 2022, in press.
Download: [code]
(2011KB)
[go top]
Codes for multi-instance multi-label learning:
Multi-instance multi-label boosting &
Multi-instance multi-label SVM
Description: The package includes the MATLAB code of
algorithms MIMLBOOST
and MIMLSVM,
both which are designed to deal with multi-instance multi-label learning. It
is in particular useful when a real-world object is associated with multiple
instances as well as multiple labels simultaneously. A Readme file and some
sample files are included in the package.
Reference:
Z.-H. Zhou,
M.-L. Zhang.
Multi-instance multi-label learning
with applications to scene classification. In: Advances of Neural
Information Processing Systems 20 (NIPS'06), Vancouver, Canada,
2007, 1609-1616.
Download: [code]
(95 Kb)
Maximum margin method for multi-instance
multi-label learning
Description:
The package includes the MATLAB code of M3MIML,
which learns from multi-instance multi-label examples by maximum margin
strategy.
Reference:
M.-L. Zhang,
Z.-H. Zhou.
M3MIML: A maximum margin method for
multi-instance multi-label learning. In: Proceedings of the 8th
IEEE International Conference on Data Mining (ICDM'08), Psia, Italy,
2008, 688-697.
Download: [code]
(7.2 Kb)
Multi-instance
multi-label RBF neural networks
Description: This toolbox contains programs for the
multi-instance multi-label neural networks adopted from the traditional RBF neural networks.
Reference:
M.-L. Zhang, Z.-J. Wang.
MIMLRBF: RBF neural networks for multi-instance
multi-label learning. Neurocomputing, 2009, 72(16-18): 3951-3956.
Download: [code]
(616Kb)
Multi-instance
multi-label lazy learner
Description: This toolbox contains programs for the
multi-instance multi-label learner based on k-nearest neighbor
techniques.
Reference:
M.-L. Zhang. A k-nearest
neighbor based multi-instance multi-label learning algorithm. In: Proceedings of the
22nd International Conference on Tools with Artificial Intelligence (ICTAI'10),
Arras, France, 2010, 207-212.
Download: [code]
(615Kb)
[go top]
Other codes:
Ensemble learning with unlabeled data
Description: The package includes the MATLAB
code of UDEED, which is designed for ensemble learning
with unlabeled data. Specifically, UDEED works by
maximizing accuracies of base learners on labeled data while maximizing
diversity among them on unlabeled data. A Readme file and with the sample
data are included in the package.
Reference:
M.-L. Zhang, Z.-H. Zhou.
Exploiting unlabeled data to enhance ensemble
diversity. In: Proceedings of the 10th IEEE International Conference on
Data Mining (ICDM'10), Sydney, Australia, 2010, 619-628.
Download: [code]
(65Kb)
Co-training algorithm with data editing
Description: The package includes the MATLAB code of
CoTrade,
which is designed for enhancing traditional co-training algorithm with data
editing techniques. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang,
Z.-H. Zhou.
CoTrade: Confident co-training with data editing.
IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics,
2011, 41(6): 1612-1626.
Download: [code]
(111Kb)
Partial label learning without disambiguation
Description: The package includes the MATLAB code of
PL-ECOC,
which is designed for learning from partial label data by adapting the ECOC
techniques. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang, F. Yu, C.-Z.
Tang. Disambiguation-free partial label learning.
IEEE Transactions on Knowledge and Data Engineering, 2017,
29(10): 2155-2167.
[conference version]
Download: [code]
(951Kb)
Instance-based partial label learning
Description: The package includes the MATLAB code of
IPAL, which is designed for learning from partial label data via
instance-based
techniques. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang, F. Yu.
Solving the partial label
learning problem: An instance-based approach. In: Proceedings of the
24th International Joint Conference on Artificial Intelligence
(IJCAI'15), Buenos Aires, Argentina, 4048-4054.
Download: [code]
(918Kb)
Maximum margin
partial label learning
Description: The package includes the MATLAB code of
M3PL, which is designed for learning from partial label data via maximum
margin
techniques. A Readme file and some
sample files are included in the package.
Reference:
F. Yu, M.-L. Zhang. Maximum
margin partial label learning. In: Proceedings of the
7th Asian
Conference on
Machine Learning
(ACML'15), Hong Kong, China, 2015, 96-111.
Requirement: Before running the M3PL algorithm, please ensure that
LibLinear package is put under the matlab path and
cvx toolbox containing
Mosek solver is also pre-installed.
Download: [code]
(919Kb)
Feature-aware
disambiguation partial label learning
Description: The package includes the MATLAB code of
PL-LEAF, which learns from partial label data by conducting feature-aware
disambiguation. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang,
B.-B. Zhou, X.-Y. Liu. Partial label learning via
feature-aware disambiguation. In: Proceedings of the 22nd ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD'16), San
Francisco, CA, 2016, 1335-1344.
Download: [code]
(357Kb)
Confidence-rated
discriminative partial label learning
Description: The package includes the source code of
CORD, which learns from partial label data by rating the ground-truth
labeling confidences of candidate labels. A Readme file and some
sample files are included in the package.
Reference:
C.-Z. Tang, M.-L. Zhang.
Confidence-rated discriminative partial label learning. In:
Proceedings of the 31st AAAI Conference on
Artificial Intelligence
(AAAI'17), San Francisco, CA, 2017, 2611-2617.
Download: [code]
(4.91Mb)
Binary
decomposition for partial label learning
Description: The package includes the source code of
PALOC, which learns from partial label data by adapting the one-vs-one
decomposition strategy. A Readme file and some
sample files are included in the package.
Reference:
X. Wu, M.-L. Zhang.
Towards enabling binary decomposition for partial label learning. In: Proceedings of the
27th International Joint Conference on Artificial Intelligence
(IJCAI'18), Stockholm, Sweden, 2018,
2868-2974.
Download: [code]
(898Kb)
Class-imbalance
aware partial label learning
Description: The package includes the source code of
CIMAP, which learns from partial label data by addressing the inherent
class-imbalance problem. A Readme file and some
sample files are included in the package.
Reference:
J. Wang, M.-L. Zhang.
Towards mitigating the class-imbalance problem for partial label learning. In: Proceedings of
the
24th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD'18), London,
UK, 2018, 2427-2436.
Download: [code]
(360Kb)
Partial
multi-label learning with credible label elicitation
Description: The package includes the source code of
PARTICLE, which deals with partial multi-label learning problem by
elicitating credible labels from caniddate label set. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang, J.-P.
Fang. Partial multi-label learning via credible label elicitation. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
2021, 43(10): 3587-3599. [conference version]
Download: [code]
(314Kb)
Multi-dimensional
classification with kNN augemented features
Description: The package includes the source code of
KRAM, which facilitates multi-dimensional classification by enriching the
input space with kNN augemented features. A Readme file and some
sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang.
Multi-dimensional classification via kNN
feature augmentation.
Pattern Recognition,
2020, 106: Article 107423. [Conference version]
Download: [code]
(1.16Mb)
Multi-dimensional
classification via selective feature augmentation
Description: The package includes the source code of
SFAM, which facilitates multi-dimensional classification by enriching the
input space with selective feature augmentation.. A Readme file and some
sample files are included in the package.
Reference:
B.-B. Jia, M.-L. Zhang.
Multi-dimensional classification via selective feature augmentation.
Machine Intelligence Research, in press.
Download: [code]
(342Kb)
Multi-dimensional
classification via decomposed label encoding
Description: The package includes the source code of
DLEM, which solves the multi-dimensional classification problem via
decomposed label encoding. A Readme file and some
sample files are included in the package.
Reference:
B.-B. Jia, M.-L.
Zhang. Multi-dimensional classification via
decomposed label encoding. IEEE Transactions on Knowledge and Data Engineering,
in press.
Download: [code]
(295Kb)
Maxium margin
multi-dimensional classification
Description: The package includes the source code of
M3MDC, which solves the multi-dimensional classification problem based on
maximum margin criterion. A Readme file and some
sample files are included in the package.
Reference:
B.-B. Jia, M.-L. Zhang.
Maximum margin multi-dimensional classification.
IEEE Transactions on Neural Networks and Learning Systems, in
press. [Conference version]
Download: [code]
(295Kb)
Multi-dimensional
classification via decomposition-based classifier chains
Description: The package includes the source code of
DCC, which solves the multi-dimensional classification problem via
decomposition-based classifier chains. A Readme file and some
sample files are included in the package.
Reference:
B.-B. Jia, M.-L. Zhang.
Decomposition-based classifier chains for
multi-dimensional classification.
IEEE Transactions on Artificial Intelligence, in
press.
Download: [code]
(335Kb)
Multi-dimensional
classification via stacked dependency exploitation
Description: The package includes the source code of
SEEM, which solves the multi-dimensional classification problem based on a
deterministic strategy of stacked dependency exploitation. A Readme file and some
sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang.
Multi-dimensional classification via stacked dependency exploitation.
Science China Information Sciences,
2020, 63(12): Article 222102.
Download: [code]
(343Kb)
Instance-based
multi-dimensional classification
Description: The package includes the source code of
MD-kNN, which solves the multi-dimensional classification problem based on
instance-based techniques. A Readme file and some
sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang.
MD-kNN: An instance-based approach for multi-dimensional
classification. In:
Proceedings of the 25th International Conference on Pattern Recognition
(ICPR'20),
Milan, Italy, 126-133.
Download: [code]
(342Kb)
Sparse label
encoding for multi-dimensional classification
Description: The package includes the source code of
SLEM, which solves the multi-dimensional classification problem via sparse
label encoding. A Readme file and some
sample files are included in the package.
Reference: B.-B.
Jia, M.-L. Zhang. Multi-dimensional
classification via sparse label encoding. In: Proceedings of the
38th International Conference on Machine Learning (ICML'21), Virtual
Conference, 2021, 4917-4926.
Download: [code]
(296Kb)
Consistency regularization for deep peartial
label learning
Description: The package includes the source code of
a regularized training framework for deep partial label learning, which
utilizes an effective regualrization term by involving a conformal label
distribution for each instance adaptively inferred by bi-level optimization. A Readme file and some
sample files are included in the package.
Reference: D.-D.
Wu, D.-B. Wang, M.-L. Zhang.
Revisiting consistency regularization for deep partial label learning. In: Proceedings of the
39th International Conference on Machine Learning (ICML'22), Baltimore,
MD, 2022, in press.
Download: [code]
(25Kb)
Dual perspective of label-specific feature
learning
Description: The package includes the source code of
DELA, which enables label-specific feature learning for multi-label
classification by exploiting non-informative features. A Readme file and some
sample files are included in the package.
Reference: J.-Y.
Hang, M.-L. Zhang. Dual perspective of label-specific feature
learning for multi-label classification. In: Proceedings of the
39th International Conference on Machine Learning (ICML'22), Baltimore,
MD, 2022, in press.
Download: [code]
(157Kb)
Linear discriminant
analysis for partial label learning
Description: The package includes the source code of
DELIN, which performs dimensionality reduction for partial label learning by
adapting the linear discriminant analysis (LDA) techniques. A Readme file and some
sample files are included in the package.
Reference:
M.-L. Zhang, J.-H.
Wu, W.-X. Bao. Disambiguation enabled linear
discriminant analysis for partial label dimensionality reduction.
ACM Transactions on Knowledge Discovery from Data, in press.
Download: [code]
(922Kb)
Partial label
learning with adaptive gragh guided disambiguation
Description: The package includes the source code of
PL-AGGD, which learns from partial label data by conducting disambiguation
guided by adaptive graph construction. A Readme file and some
sample files are included in the package.
Reference:
D.-B. Wang, M.-L. Zhang,
L. Li. Adaptive graph guided disambiguation
for partial label learning. IEEE Transactions on Pattern Analysis and Machine
Intelligence, in press.
Download: [code]
(917Kb)
Multi-view partial
multi-label learning with graph-based disambiguation
Description: The package includes the source code of
GRADIS, which deals with multi-view partial multi-label learning problem
with graph-based disambiguation. A Readme file is included in the package.
Reference:
Z.-S. Chen, X. Wu, Q.-G. Chen, Y. Hu, M.-L. Zhang. Multi-view partial
multi-label learning with graph-based disambiguation. In: Proceedings of
the 34th AAAI Conference on Artificial Intelligence (AAAI'20), New York,
NY, 3553-3560.
Download: [code]
(430Kb)
Multi-view partial
multi-label learning with feature-induced manifold disambiguation
Description: The package includes the source code of
FIMAN, which deals with multi-view partial multi-label learning problem
with feature-induced manifold disambiguation. A Readme file is included in the package.
Reference: J.-H. Wu, X. Wu, Q.-G. Chen, Y. Hu, M.-L.
Zhang. Feature-induced manifold disambiguation for multi-view partial
multi-label learning. In: Proceedings of the 26th
ACM SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD'20), Virtual Event, CA, 2020,
557-565.
Download: [code]
(310Kb)
Semi-supervised
partial label learning via confidence-rated margin maximization
Description: The package includes the source code of
PARM, which deals with semi-supervised partial label learning via
confidence-rated margin maximization. A Readme file is included in the package.
Reference: W. Wang, M.-L. Zhang.
Semi-supervised partial label learning via confidence-rated margin
maximization. In: Advances in Neural Information Processing Systems
33 (NeurIPS'20), Vancouver, Canada, 2020, 6982-6993.
Download: [code]
(921Kb)
Learning from noisy labels with complementary loss functions
Description: The package includes the source code of
CompLoss, which learns from noisy labels with complementary loss functions. A Readme file is included in the package.
Reference: D.-B. Wang, Y. Wen, L. Pan, M.-L. Zhang.
Learning from noisy labels with complmentary loss functions. In:
Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI'21),
Virtual Event, 2021, 10111-10119.
Download: [code]
(15Kb)
Exploiting
unlabeled data via partial label assignment for multi-class semi-supervised
learning
Description: The package includes the source code of
EUPAL, which solves the multi-class semi-supervised learning problem via
partial label assignment. A Readme file is included in the package.
Reference:
Z.-R. Zhang, Q.-W. Zhang, Y. Cao, M.-L. Zhang.
Exploiting unlabeled data via partial label
assignment for multi-class semi-supervised learning. In:
Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI'21),
Virtual Event, 2021, 10973-10980.
Download: [code]
(852Kb)
Learning from
complementary labels via partial-output consistency regularization
Description: The package includes the source code of
POCR, which solves the complementary label learning problem via
partial-output consistency regularization. A Readme file is included in the package.
Reference: D.-B. Wang, L. Feng, M.-L. Zhang.
Learning from complementary labels via partial-output consistency
regularization. In: Proceedings of the 30th International Joint
Conference on Artificial Intelligence (IJCAI'21), Virtual
Conference, 2021, 3075-3081.
Download: [code]
(23Kb)
Discriminative
complementary-label learning with weighted loss
Description: The package includes the source code of
L-W, which solves the complementary label learning problem via
discriminative modeling with weighted loss. A Readme file is included in the package.
Reference: Y. Gao, M.-L. Zhang.
Discriminative complementary-label learning with weighted loss. In:
Proceedings of the
38th International Conference on Machine Learning (ICML'21),
Virtual Conference, 2021, 3587-3597.
Download: [code]
(5Kb)
Dependence
maximization for partial label learning
Description: The package includes the source code of
CENDA, which performs dimensionality reduction for partial label learning
via confidence-based dependence maximization. A Readme file is included in the package.
Reference: W.-X. Bao, J.-Y. Hang, M.-L.
Zhang. Partial label dimensionality
reduction via confidence-based dependence maximization.
In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD'21), Virtual Event, Singapore,
46-54.
Download: [code]
(919Kb)
Submodular feature
selection for partial label learning
Description: The package includes the source code of
SAUTE, which performs feature selectionfor partial label learning
via submodular mutual information function. A Readme file is included in the package.
Reference: W.-X. Bao, J.-Y. Hang, M.-L. Zhang.
Submodular feature selection for partial label
learning.
In: Proceedings of the 28th
ACM SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD'22), Washington D. C., 2022, in
press.
Download: [code]
(1.45Mb)
Partial label
learning with discrimination augmentation
Description: The package includes the source code of
PLDA, which solves partial label learning problem by augmenting the feature
space with confidence-rated class prototype features with good
discriminative information. A Readme file is included in the package.
Reference: W. Wang, M.-L. Zhang.
Partial label learning with discrimination
augmentation.
In: Proceedings of the 28th
ACM SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD'22), Washington D. C., 2022, in
press.
Download: [code]
(11Kb)
[go top]
Relic: a multi-instance version of C4.5 decision tree developed by G. Ruffo [Refer: Ruffo G. Learning single and multiple decision trees for security applications. PhD dissertation, Department of Computer Science, University of Turin, Italy, 2000.]
RipperMI: a multi-instance version of rule learning algorithm Ripper developed by Y. Chevaleyre [Refer: Chevaleyre Y, Zucker J-D. Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: Lecture Notes in Artificial Intelligence 2056, Berlin: Springer, 2001, 204-214.]
BoosTexter: a general purpose machine-learning program based on boosting for building a classifier from text and/or attribute-value data. [Refer: Schapire R E, Singer Y. BoosTexter: a boosting system for text categorization. Machine Learning, 2000, 39(2-3): 136-168.]
ADTBoost.MH: multi-label alternating decision tree construction software
developed by F. De Comité et al. [Refer:
Comité F D, Gilleron R, Tommasi M. Learning multi-label alternating decision tree
from texts and data. In: Lecture Notes in Computer Science 2734,
Berlin: Springer, 2003, 35-49.]
[go top]