Resources

[data] [codes] [software]

Data sets for multi-instance learning:

Musk data: benchmark test data for multi-instance learning [Refer: Dietterich T G, Lathrop R H, Lozano-Pérez, Solving the multiple-instance problem with axis-parallel rectangles, Artificial Intelligence, 1997, 89(1-2): 31-71.]
Artificial multi-instance data: benchmark test data for multi-instance regression [Refer: Amar R A, Dooly D R, Goldman S A, Zhang Q. Multiple-instance learning of real-valued data. In: Proceedings of the 18th International Conference on Machine Learning (ICML'01), Williamstown, MA, 2001, 3-10.]
Data for MIL based web index recommendation: data set used for MIL based web index recommendation [Refer: Z.-H. Zhou, K. Jiang, and M. Li. Multi-instance learning based web mining. Applied Intelligence, 22(2): 135-147, 2005.]

Data sets for multi-label learning:

Yeast data: test data for multi-label learning given by A. Elisseeff and J. Weston [Refer: Elisseeff A, Weston J. A kernel method for multi-labelled classification. In: Advances in Neural Processing Systems 14, Cambridge, MA: MIT Press, 2002, 681-687.]
Image data: test data for multi-label learning given by M.-L. Zhang and Z.-H. Zhou [Refer: M.-L. Zhang, Z.-H. Zhou. ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038-2048.]
Web page data: test data for multi-label learning given by N. Ueda and K. Saito [Refer: Ueda N, Saito K. Parametric mixture models for multi-label text. In: Advances in Neural Processing Systems 15, Cambridge, MA: MIT Press, 2003, 721-728.]
Reuters corpus: benchmark test data for multi-label text categorization
More multi-label data sets are available at the Mulan Library and the Sourceforge Network

Data sets for partial label learning:

Notice: The following partial label learning data sets were collected and pre-processed by me, with courtesy and proprietary to the authors of referred literatures on them. The pre-processed data sets can be used at your own risk and for academic purpose only.
After unzipping and loading each data set in the Matlab environment, you can find three variables named "data", "partial_target" and "target" organized in the following way:
"data": an Mxd matrix w.r.t. the feature representations, where M is the number of instances and d is the number of features. Here, data(i,:) stores the feature vector of the ith instance.
"partial_target": a QxM matrix w.r.t. the candidate labeling information, where Q is the number of possible class labels. Here, partial_target(j,i)=1 if the jth class label is among the candidate label set of the ith instance; Otherwise, partial_target(j,i)=0.
"target": a QxM matrix w.r.t. the ground-truth labeling information. Here, target(j,i)=1 if the jth class label is the ground-truth label of the ith instance; Otherwise, target(j,i)=0.
FG-NET data: facial age estimation from crowd-sourced annotations [Refer: G. Panis, A. Lanitis. An overview of research activities in facial age estimation using the FG-NET aging database. Lecture Notes in Computer Science 8926, Berlin: Springer, 2015, 737-750.] (1.98Mb)
Lost data: automatic face naming from videos [Refer: T. Cour, B. Sapp, B. Taskar. Learning from partial labels. Journal of Machine Learning Research, 12(May): 1501–1536, 2011.] (914Kb)
MSRCv2 data: object classification [Refer: L. Liu, T. Dietterich. A conditional multinomial mixture model for superset label learning. In: Advances in Neural Information Processing Systems 25, Cambridge, MA: MIT Press, 2012, 557–565.] (373Kb)
BirdSong data: bird song classification [Refer: F. Briggs, X. Z. Fern, R. Raich. Rank-loss support instance machines for MIML instance annotation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012, 534–542.] (1.00Mb)
Soccer Player data: automatic face naming from images [Refer: Z. Zeng, S. Xiao, K. Jia, T.-H. Chan, S. Gao, D. Xu, Y. Ma. Learning by associating ambiguously labeled images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, 2013, 708–715.] (35.18Mb)
Yahoo! News: automatic face naming from images [Refer: M. Guillaumin, J. Verbeek, C. Schmid. Multiple instance metric learning from automatically labeled bags of faces. In: Lecture Notes in Computer Science 6311, Berlin: Springer, 2010, 634–647.] (28.04Mb)
Mirflickr data: web image classification [Refer: M. J. Huiskes, M. S. Lew. The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, 2008, 39–43.] (30.40Mb)

Data sets for partial multi-label learning:

Notice: The following partial multi-label learning data sets were collected and pre-processed by me, with courtesy and proprietary to the authors of referred literatures on them. The pre-processed data sets can be used at your own risk and for academic purpose only.
In the zipped file for each dataset, there are two files named "DATANAME.mat" and "DATANAME.txt". "DATANAME.mat" corresponds to the data file and "DATANAME.txt" includes a brief description of the data set. After loading "DATANAME.mat" in the Matlab environment, you can find three variables named "data", "candidate_labels" and "target" organized in the following way:
"data": an Mxd matrix w.r.t. the feature representations, where M is the number of instances and d is the number of features. Here, data(i,:) stores the feature vector of the ith instance.
"candidate_labels": a QxM matrix w.r.t. the candidate labeling information, where Q is the number of possible class labels. Here, candidate_labels(j,i)=1 if the jth class label is among the candidate label set of the ith instance; Otherwise, candidate_labels(j,i)=0.
"target": a QxM matrix w.r.t. the ground-truth labeling information. Here, target(j,i)=1 if the jth class label is the ground-truth label of the ith instance; Otherwise, target(j,i)=0.
Music_emotion data: music classification from the emotion perspective [Refer: M.-L. Zhang and J.-P. Fang. Partial multi-label learning via credible label elicitation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3587–3599.] (4.79Mb)
Music_style data: music classification from the style perspective [Refer: M.-L. Zhang and J.-P. Fang. Partial multi-label learning via credible label elicitation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3587–3599.] (4.79Mb)
Mirflickr data: image retrieval task [Refer: M. J. Huiskes, M. S. Lew. The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, 2008, 39–43.] (7.67Mb)
YeastBP data, YeastCC data, YeastMF data: protein-protein interaction [Refer: G. Yu, X. Chen, C. Domeniconi, J. Wang, Z. Li, Z. Zhang, and X. Wu. Feature-induced partial multi-label learning. In: Proceedings of the 18th IEEE International Conference on Data Mining, Singapore, 2018, 1398–1403.] (1.09Mb, 1.02Mb, 1.02Mb)

Data sets for multi-instance partial-label (MIPL) learning:

Notice: The following multi-instance partial-label learning data sets were collected and pre-processed by me, with courtesy and proprietary to the authors of referred literatures on them. The pre-processed data sets can be used at your own risk and for academic purpose only.
For the benchmark datasets (MNIST-MIPL, FMNIST-MIPL, Newsgroups-MIPL, Birdsong-MIPL, SIVAL-MIPL), each zipped file consists of three files and one folder, including "DATANAME_r1.mat", "DATANAME_r2.mat", "DATANAME_r3.mat", and the folder called "index".
For the CRC-MIPL-Row, CRC-MIPL-SBN, CRC-MIPL-KMeansSeg, and CRC-MIPL-SIFT datasets, each zipped file contains a "DATANAME.mat" and a folder called "index".
After unzipping and loading each data set and its index in the Matlab environment, you can find three variable named "data", "trainIndex", and "testIndex" organized in the following way:
"data": an Mx4 cell array for the benckmark datasets or an Mx3 cell array for the CRC-MIPL datasets, where M is the number of multi-instance bags. Here, data(i,1) stores the feature vector of the ith bag and can be converted to a Nixd matrix, where Ni is the number of instances in the ith bag and d is the number of features per instance. data(i,2) and data(i,3) are the candidate label set and ground-truth label of the ith bag, respectively. And data(i,4) is only for the benchmark datasets, which reserves the ground-truth labels of the instances in the ith bag.
"trainIndex": a 1xTr matrix, where Tr is the number of training bags.
"testIndex": a 1xTe matrix, where Te is the number of test bags.
The folder "index" contains ten partitions of 50%/50% or 70%/30% random train/test, which are for reference only. Please feel free to use your own partitions for experimental evaluation.

Data Set	#bags	#instances	max. #instances	min. #instances	avg. #instances	# dimensions	# classes	avg. #CLs
MNIST-MIPL (33MB)	500	20664	48	35	41.33	784	5	2, 3, 4
FMNIST-MIPL (85MB)	500	20810	48	36	41.62	784	5	2, 3, 4
Newsgroups-MIPL (3.16MB)	1000	43122	86	11	43.12	200	10	2, 3, 4
Birdsong-MIPL (38MB)	1300	48425	76	25	37.25	38	13	2, 3, 4
SIVAL-MIPL (29.7MB)	1500	47414	32	31	31.61	30	25	2, 3, 4
CRC-MIPL-Row (3.3MB)	7000	56000	8	8	8.00	9	7	2.08
CRC-MIPL-SBN (2.0MB)	7000	63000	9	9	9.00	15	7	2.08
CRC-MIPL-KMeansSeg (1.4MB)	7000	30178	8	3	4.31	6	7	2.08
CRC-MIPL-SIFT (17.3MB)	7000	175000	25	25	25	128	7	2.08

NOTE: #bags, #instances, max. #instances, min. #instances, avg. #instances, #dimensions, #classes, and avg. #CLs denote the number of bags, number of instances, maximum number of instances in a bag, minimum number of instances in a bag, average number of instances in all bags, dimension of each instance, number of targeted class labels, average size of candidate label set in each dataset.

More details can be found in the references [W. Tang, W. Zhang, M.-L. Zhang. Multi-instance partial-label learning: Towards exploiting dual inexact supervision. Science China Information Sciences, in press.] and [W. Tang, W. Zhang, M.-L. Zhang. Disambiguated attention embedding for multi-instance partial-label learning. In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), New Orleans, LA, 2023, in press.]
　

Data sets for multi-dimensional classification:

Notice: The following multi-dimensional classification data sets were collected and pre-processed by me, with courtesy and proprietary to the authors of referred literatures on them. The pre-processed data sets can be used at your own risk and for academic purpose only.
In the zipped file for each dataset, there are a total of four files, including "DATANAME.mat", "DATANAME.txt", "dataset_statistics.m", and "demo.m". Here, "DATANAME.mat" corresponds to the data file, "DATANAME.txt" includes a detailed description of the data set, running "dataset_statistics.m" in Matlab can output the characteristics of the data set, and "demo.m" is a demo of MDC baselines binary relevance (BR) & class powerset (CP) with base classifiers SVM and decision tree. After unzipping and loading "DATANAME.mat" in the Matlab environment, you can find five variables named "data", "data_type", "target", "data_name" and "idx_folds" organized in the following way:
"data": A struct w.r.t. the input attribute representations, where
    -data.orig is an mxd matrix and stores the original input attribute representations, where m is the number of instances and d is the number of features. Here, data.orig(i,:) stores the feature vector of the ith instance. If your learning algorithm is sensitive to the type of input attributes like decision tree, naive Bayes classifier, etc., then you should use data.orig;
    -data.norm is an mxd' matrix and stores the preprocessed version of data.orig where discrete-valued attributes are transformed into their one-hot form and continuous-valued attributes are normalized into [0,1]. If your learning algorithm can only accept continuous-valued input attributes like support vector machine, logistic regression, etc., then you should use data.norm;
    NOTE: If all input attributes are continuous-valued, then data.orig is empty and data.norm is a [0,1]-normalized matrix.
"data_type": A struct w.r.t. the input attributes where
    -data_type.d_wo_o stores the indexes of all input attributes whose type is discrete-valued without ordinal relationship (a.k.a. categorical/nominal);
    -data_type.d_w_o stores the indexes of all input attributes whose type is discrete-valued with ordinal relationship;
    -data_type.b stores the indexes of all input attributes whose type is binary-valued;
    -data_type.c stores the indexes of all input attributes whose type is continuous-valued (a.k.a. numeric).
    NOTE: The corresponding field is empty when no such type of input attributes exist.
"target": An mxq matrix w.r.t. the labeling information, where q is the number of dimensions. Here, target(i,:) stores the class vector associated with the ith instance.
"data_name": A string which stores the name of this data set.
"idx_folds": A 10x1 cell w.r.t. the data partition in ten-fold cross validation, where
    -idx_folds{i}.train stores the indexes of training examples in the ith cross validation,
    -idx_folds{i}.test stores the indexes of testing examples in the ith cross validation.
    NOTE: These ten-fold cross validation partitions are only given for reference purpose. Please feel free to use your own partitions for experimental evaluation.

Dataset
# Examples # Dimensions # Labels/Dimension # Features

Edm (8KB) 154 2 3 16n

Flare1 (5KB) 323 3 3,4,2 10x

Oes97 (255KB) 334 16 3 263n

Jura (22KB) 359 2 4,5 9n

Oes10 (325KB) 403 16 3 298n

Enb (14KB) 768 2 2,4 6n

Song (567KB) 785 3 3 98n

WQplants (49KB) 1060 7 4 16n

WQanimals (49KB) 1060 7 4 16n

WaterQuality (51KB) 1060 14 4 16n

BeLaE (94KB) 1930 5 5 1n,44x

Voice (409KB) 3136 2 4,2 19n

Scm20d (2.57MB) 8966 16 4 61n

Rf1 (1.07MB) 8987 8 4,4,3,4,4,3,4,3 64n

Thyroid (445KB) 9172 7 5,5,3,2,4,4,3 7n,22x

Pain (9.53MB) 9734 10 2,5,4,2,2,5,2,5,2,2 136n

Scm1d (15.36MB) 9803 16 4 280n

CoIL2000 (715KB) 9822 5 6,10,10,4,2 81x

TIC2000 (729KB) 9822 3 6,4,2 83x

Flickr (135MB) 12198 5 3,4,3,4,4 1536n

Disfa (12.86MB) 13095 12 5,5,6,3,4,4,5,4,4,4,6,4 136n

Fera (13.82MB) 14052 5 6 136n

Adult (852KB) 18419 4 7,7,5,2 5n,5x

Default (3.54MB) 28779 4 2,7,4,2 14n,6x

Dataset	# Examples	# Dimensions	# Labels/Dimension	# Features
Edm (8KB)	154	2	3	16n
Flare1 (5KB)	323	3	3,4,2	10x
Oes97 (255KB)	334	16	3	263n
Jura (22KB)	359	2	4,5	9n
Oes10 (325KB)	403	16	3	298n
Enb (14KB)	768	2	2,4	6n
Song (567KB)	785	3	3	98n
WQplants (49KB)	1060	7	4	16n
WQanimals (49KB)	1060	7	4	16n
WaterQuality (51KB)	1060	14	4	16n
BeLaE (94KB)	1930	5	5	1n,44x
Voice (409KB)	3136	2	4,2	19n
Scm20d (2.57MB)	8966	16	4	61n
Rf1 (1.07MB)	8987	8	4,4,3,4,4,3,4,3	64n
Thyroid (445KB)	9172	7	5,5,3,2,4,4,3	7n,22x
Pain (9.53MB)	9734	10	2,5,4,2,2,5,2,5,2,2	136n
Scm1d (15.36MB)	9803	16	4	280n
CoIL2000 (715KB)	9822	5	6,10,10,4,2	81x
TIC2000 (729KB)	9822	3	6,4,2	83x
Flickr (135MB)	12198	5	3,4,3,4,4	1536n
Disfa (12.86MB)	13095	12	5,5,6,3,4,4,5,4,4,4,6,4	136n
Fera (13.82MB)	14052	5	6	136n
Adult (852KB)	18419	4	7,7,5,2	5n,5x
Default (3.54MB)	28779	4	2,7,4,2	14n,6x

NOTE1: If the number of class labels in each class space is identical, then only this number is recorded; Otherwise, the number of class labels in each class space is recorded in turn.

NOTE2: In the last column, n and x denote numeric and nominal features, respectively. Here, we refer to all the three non-numeric types of features (i.e., discrete-valued without/with ordinal relationship and binary-valuded) as nominal features.

[go top]

ATTENTION: The following packages were developed by me, you can feel free to use the package (for academic purpose only). To run those programs, the Matlab environment is required. For helps on using the main functions (e.g. main_func.m) of each package, please type "help main_func" in Matlab prompt.
Any problem concerning the code, please feel free to contact me.

Codes for multi-instance learning:

MIL learners and their ensemble versions
Description: This toolbox contains programs for four different multi-instance learners, i.e. Diverse Density, Citation-kNN, Iterated-discrim APR, and EM-DD. Ensemble versions of these individual MIL learners are also included in the package. There is a ReadMe file roughly explaining the codes.
Reference: Z.-H. Zhou, M.-L. Zhang. Ensembles of multi-instance learners. In: Proceedings of the 14th European Conference on Machine Learning (ECML'03), Cavtat-Dubrovnik, Croatia, LNAI 2837, 2003, pp.492-502.
Download: [code] (3.86Mb)
RBF neural networks for MIL
Description: This toolbox contains programs for the multi-instance learner adapted from traditional RBF neural networks.
Reference: M.-L. Zhang, Z.-H. Zhou. Adapting RBF neural networks to multi-instance learning. Neural Processing Letters, 2006, 23(1): 1-26.
Download: [code] (4 Kb)
BP neural networks for MIL
Description: This toolbox contains programs for the multi-instance learner adapted from traditional BP neural networks.
Reference: Z.-H. Zhou, M.-L. Zhang. Neural networks for multi-instance learning. Technical Report, AI Lab, Computer Science & Technology Department, Nanjing University, Nanjing, China, Aug. 2002.
Download: [code] (3 Kb)
Constructive clustering ensemble for MIL
Description: This toolbox contains programs for the multi-instance learner based on constructive clustering ensemble
Reference: Z.-H. Zhou, M.-L. Zhang. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, 2007, 11(2): 155-170.
Download: [code] (1.47Mb)
[go top]

Codes for multi-label learning:

Multi-label lazy learning approach
Description: This toolbox contains programs for the multi-label lazy learner adapted from traditional k-nearest neighbor algorithm.
Reference: Zhang M-L, Zhou Z-H. ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038-2048.
Download: [code] (1.28 Mb)
Multi-label support vector machines
Description: This toolbox contains programs for the multi-label kernel learner proposed by A. Elisseeff and J. Weston.
Reference: A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 681-687. MIT Press, Cambridge: MA, 2002.
Download: [code] (8 Kb)
Multi-label BP neural networks
Description: This toolbox contains programs for the multi-label neural networks adopted from the traditional BP neural networks.
Reference: M.-L. Zhang, Z.-H. Zhou. Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1338-1351.
Requirement: The "Neural Network Toolbox" of Matlab must be available.
Download: [code] (582 Kb)
Multi-label RBF neural networks
Description: This toolbox contains programs for the multi-label neural networks adopted from the traditional RBF neural networks.
Reference: M.-L. Zhang. ML-RBF: RBF neural networks for multi-label learning. Neural Processing Letters, 2009, 29(2): 61-74.
Requirement: The "Statistics Toolbox" of Matlab must be available.
Download: [code] (1.28 Mb)
Multi-label naive bayes classifier (with feature selection)
Description: This toolbox contains programs for the multi-label naive bayes classifier (with feature selection).
Reference: M.-L. Zhang, J. M. Peña, V. Robles. Feature selection for multi-label naive bayes classification. Information Sciences, 2009, 179(19): 3218-3229.
Requirement: The "Genetic Algorithm and Direct Search Toolbox" of Matlab must be available.
Download: [code] (1.28 Mb)
Multi-label classifier by incorporating Bayesian network structure
Description: This toolbox contains programs for the multi-label classifier which explicitly exploits label dependency with Bayesian network structure.
Reference: M.-L. Zhang, K. Zhang. Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington D. C., 2010, 999-1007.
Requirement: The matlab package for Libsvm should be used in conjunction with this toolbox.
Download: [code] (1.35 Mb)
Multi-label classifier with label-specific features
Description: This toolbox contains programs for the multi-label classifier which utilizes label-specific features.
Reference: M.-L. Zhang. LIFT: Multi-label learning with label-specific features. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI'11), Barcelona, Spain, 2011, 1609-1614.
Requirement: The matlab package for Libsvm should be used in conjunction with this toolbox.
Download: [code] (1.39 Mb)
Multi-label class-imbalance learning
Description: The package includes the Java code of COCOA, which is designed for learning from multi-label data by addressing the class-imbalance problem. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, Y.-K. Li, H. Yang, X.-Y. Liu. Towards class-imbalance aware multi-label learning. IEEE Transactions on Cybernetics, in press.
Download: [code] (6.25 Mb)
Multi-label classifier by exploiting implicit RLI information
Description: The package includes the Java code of RELIAB, which is designed for learning from multi-label data by exploiting the implicit relative labeling-importance (RLI) information. Source code as well as running demo are included in the package.
Reference: M.-L. Zhang, Q.-W. Zhang, J.-P. Fang, Y.-K. Li, X. Geng. Leveraging implicit relative labeling-importance information for effective multi-label learning. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(5): 2057-2070.
Download: [code] (22.4 Mb)
Inductive semi-supervised multi-label learning with co-training
Description: The package includes the Matlab code of COINS, which is designed for learning from multi-label data under the inductive semi-supervised setting by adapting the co-training techniques. Source code as well as running demo are included in the package.
Reference: W. Zhan, M.-L. Zhang. In: Proceedings of the 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'17), Halifax, Canda, 2017, 1305-1314.
Download: [code] (2.20 Mb)
Multi-label learning with feature-induced labeling information enrichment
Description: The package includes the Matlab code of MLFE, which learns from multi-label data with labeling information enriched by feature-induced manipulation. Source code as well as running demo are included in the package.
Reference: Q.-W. Zhang, Y. Zhong, M.-L. Zhang. Feature-induced labeling information enrichment for multi-label learning. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18), New Orleans, TX, 4446-4453.
Download: [code] (273 Kb)
Multi-view multi-label learning with view-specific information extraction
Description: The package includes the Python code of SIMM, which learns from multi-label data with multiple views based on view-specific information extraction. Source code as well as running demo are included in the package.
Reference: X. Wu, Q.-G. Chen, Y. Hu, D. Wang, X. Chang, X. Wang, M.-L. Zhang. Multi-view multi-label learning with view-specific information extraction. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI'19), Macau, China, 2019, 3884-3890.
Download: [code] (1.84 Mb)
Multi-label classification with compositional metric learning
Description: The package includes the Matlab code of COMMU, which performs multi-label classificaiton with compositional metric learning. Source code as well as running demo are included in the package.
Reference: Y.-P. Sun, M.-L. Zhang. Multi-label classification with compositional metric learning. Frontiers of Computer Science, 2021, 15(5): Article 155320.
Download: [code] (1.73 Mb)
BiLable-specific features for multi-label classification
Description: The package includes the Matlab code of BiLAS, which performs multi-label classificaiton with bilabel specific features. Source code as well as running demo are included in the package.
Reference: M.-L. Zhang, J.-P. Fang, Y-B. Wang. BiLabel-specific features for multi-label classification. ACM Transactions on Knowledge Discovery from Data, 2021, 16(1): Article 18.
Download: [code] (1.13 Mb)
Wrapped label-specific features for multi-label classification
Description: The package includes the Python code of WRAP, which performs multi-label classificaiton with label-specific features in wrapped mode. Source code as well as running demo are included in the package.
Reference: Z.-B. Yu, M.-L. Zhang. Multi-label classification with label-specific feature generation: A wrapped approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press.
Download: [code] (708 Kb)
Deep label-specific features for multi-label classification
Description: The package includes the Python code of CLIF, which performs multi-label classificaiton with collaborative learning of label semantics and deep label-specific features. Source code as well as running demo are included in the package.
Reference: J.-Y. Hang, M.-L. Zhang. Collaborative learning of label semantics and deep label-specific features for multi-label classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press.
Download: [code] (1.85 Mb)
Correlation-guided representation for multi-label text classification
Description: The package includes the source code of CORE, which solves the multi-label text classification problem via correlation-guided representation. A Readme file is included in the package.
Reference: Q.-W. Zhang, X. Zhang, Z. Yan, R. Liu, Y. Cao, M.-L. Zhang. Correlation-guided representation for multi-label text classification. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI'21), Virtual Conference, 2021, 3363-3369.
Download: [code] (144Kb)
End-to-end probabilistic label-specific feature learning for multi-label classification
Description: The package includes the source code of PACA, which solves the multi-label text classification problem via end-to-end probabilistic label-specific feature learning. A Readme file is included in the package.
Reference: J.-Y. Hang, M.-L. Zhang, Y. Feng, X. Song. End-to-end probabilistic label-specific feature learning for multi-label classification. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI'22), Vancouver, Canada, 2022, in press.
Download: [code] (2011KB)
[go top]

Codes for multi-instance multi-label learning:

Multi-instance multi-label boosting & Multi-instance multi-label SVM
Description: The package includes the MATLAB code of algorithms MIMLBOOST and MIMLSVM, both which are designed to deal with multi-instance multi-label learning. It is in particular useful when a real-world object is associated with multiple instances as well as multiple labels simultaneously. A Readme file and some sample files are included in the package.
Reference: Z.-H. Zhou, M.-L. Zhang. Multi-instance multi-label learning with applications to scene classification. In: Advances of Neural Information Processing Systems 20 (NIPS'06), Vancouver, Canada, 2007, 1609-1616.
Download: [code] (95 Kb)
Maximum margin method for multi-instance multi-label learning
Description: The package includes the MATLAB code of M³MIML, which learns from multi-instance multi-label examples by maximum margin strategy.
Reference: M.-L. Zhang, Z.-H. Zhou. M³MIML: A maximum margin method for multi-instance multi-label learning. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08), Psia, Italy, 2008, 688-697.
Download: [code] (7.2 Kb)
Multi-instance multi-label RBF neural networks
Description: This toolbox contains programs for the multi-instance multi-label neural networks adopted from the traditional RBF neural networks.
Reference: M.-L. Zhang, Z.-J. Wang. MIMLRBF: RBF neural networks for multi-instance multi-label learning. Neurocomputing, 2009, 72(16-18): 3951-3956.
Download: [code] (616Kb)
Multi-instance multi-label lazy learner
Description: This toolbox contains programs for the multi-instance multi-label learner based on k-nearest neighbor techniques.
Reference: M.-L. Zhang. A k-nearest neighbor based multi-instance multi-label learning algorithm. In: Proceedings of the 22nd International Conference on Tools with Artificial Intelligence (ICTAI'10), Arras, France, 2010, 207-212.
Download: [code] (615Kb)
[go top]

Other codes:

Ensemble learning with unlabeled data
Description: The package includes the MATLAB code of UDEED, which is designed for ensemble learning with unlabeled data. Specifically, UDEED works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. A Readme file and with the sample data are included in the package.
Reference: M.-L. Zhang, Z.-H. Zhou. Exploiting unlabeled data to enhance ensemble diversity. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM'10), Sydney, Australia, 2010, 619-628.
Download: [code] (65Kb)
　
Co-training algorithm with data editing
Description: The package includes the MATLAB code of CoTrade, which is designed for enhancing traditional co-training algorithm with data editing techniques. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, Z.-H. Zhou. CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 2011, 41(6): 1612-1626.
Download: [code] (111Kb)
Partial label learning without disambiguation
Description: The package includes the MATLAB code of PL-ECOC, which is designed for learning from partial label data by adapting the ECOC techniques. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, F. Yu, C.-Z. Tang. Disambiguation-free partial label learning. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(10): 2155-2167. [conference version]
Download: [code] (951Kb)
Instance-based partial label learning
Description: The package includes the MATLAB code of IPAL, which is designed for learning from partial label data via instance-based techniques. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, F. Yu. Solving the partial label learning problem: An instance-based approach. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI'15), Buenos Aires, Argentina, 4048-4054.
Download: [code] (918Kb)
Maximum margin partial label learning
Description: The package includes the MATLAB code of M3PL, which is designed for learning from partial label data via maximum margin techniques. A Readme file and some sample files are included in the package.
Reference: F. Yu, M.-L. Zhang. Maximum margin partial label learning. In: Proceedings of the 7th Asian Conference on Machine Learning (ACML'15), Hong Kong, China, 2015, 96-111.
Requirement: Before running the M3PL algorithm, please ensure that LibLinear package is put under the matlab path and cvx toolbox containing Mosek solver is also pre-installed.
Download: [code] (919Kb)
Feature-aware disambiguation partial label learning
Description: The package includes the MATLAB code of PL-LEAF, which learns from partial label data by conducting feature-aware disambiguation. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, B.-B. Zhou, X.-Y. Liu. Partial label learning via feature-aware disambiguation. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'16), San Francisco, CA, 2016, 1335-1344.
Download: [code] (357Kb)
Confidence-rated discriminative partial label learning
Description: The package includes the source code of CORD, which learns from partial label data by rating the ground-truth labeling confidences of candidate labels. A Readme file and some sample files are included in the package.
Reference: C.-Z. Tang, M.-L. Zhang. Confidence-rated discriminative partial label learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI'17), San Francisco, CA, 2017, 2611-2617.
Download: [code] (4.91Mb)
Binary decomposition for partial label learning
Description: The package includes the source code of PALOC, which learns from partial label data by adapting the one-vs-one decomposition strategy. A Readme file and some sample files are included in the package.
Reference: X. Wu, M.-L. Zhang. Towards enabling binary decomposition for partial label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18), Stockholm, Sweden, 2018, 2868-2974.
Download: [code] (898Kb)
Class-imbalance aware partial label learning
Description: The package includes the source code of CIMAP, which learns from partial label data by addressing the inherent class-imbalance problem. A Readme file and some sample files are included in the package.
Reference: J. Wang, M.-L. Zhang. Towards mitigating the class-imbalance problem for partial label learning. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'18), London, UK, 2018, 2427-2436.
Download: [code] (360Kb)
Partial multi-label learning with credible label elicitation
Description: The package includes the source code of PARTICLE, which deals with partial multi-label learning problem by elicitating credible labels from caniddate label set. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, J.-P. Fang. Partial multi-label learning via credible label elicitation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3587-3599. [conference version]
Download: [code] (314Kb)
Multi-dimensional classification with kNN augemented features
Description: The package includes the source code of KRAM, which facilitates multi-dimensional classification by enriching the input space with kNN augemented features. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Multi-dimensional classification via kNN feature augmentation. Pattern Recognition, 2020, 106: Article 107423. [Conference version]
Download: [code] (1.16Mb)
Multi-dimensional classification via selective feature augmentation
Description: The package includes the source code of SFAM, which facilitates multi-dimensional classification by enriching the input space with selective feature augmentation.. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Multi-dimensional classification via selective feature augmentation. Machine Intelligence Research, in press.
Download: [code] (342Kb)
Multi-dimensional classification via decomposed label encoding
Description: The package includes the source code of DLEM, which solves the multi-dimensional classification problem via decomposed label encoding. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Multi-dimensional classification via decomposed label encoding. IEEE Transactions on Knowledge and Data Engineering, in press.
Download: [code] (295Kb)
Maxium margin multi-dimensional classification
Description: The package includes the source code of M3MDC, which solves the multi-dimensional classification problem based on maximum margin criterion. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Maximum margin multi-dimensional classification. IEEE Transactions on Neural Networks and Learning Systems, in press. [Conference version]
Download: [code] (295Kb)
Multi-dimensional classification via decomposition-based classifier chains
Description: The package includes the source code of DCC, which solves the multi-dimensional classification problem via decomposition-based classifier chains. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Decomposition-based classifier chains for multi-dimensional classification. IEEE Transactions on Artificial Intelligence, in press.
Download: [code] (335Kb)
Multi-dimensional classification via stacked dependency exploitation
Description: The package includes the source code of SEEM, which solves the multi-dimensional classification problem based on a deterministic strategy of stacked dependency exploitation. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Multi-dimensional classification via stacked dependency exploitation. Science China Information Sciences, 2020, 63(12): Article 222102.
Download: [code] (343Kb)
Instance-based multi-dimensional classification
Description: The package includes the source code of MD-kNN, which solves the multi-dimensional classification problem based on instance-based techniques. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. MD-kNN: An instance-based approach for multi-dimensional classification. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR'20), Milan, Italy, 126-133.
Download: [code] (342Kb)
Sparse label encoding for multi-dimensional classification
Description: The package includes the source code of SLEM, which solves the multi-dimensional classification problem via sparse label encoding. A Readme file and some sample files are included in the package.
Reference: B.-B. Jia, M.-L. Zhang. Multi-dimensional classification via sparse label encoding. In: Proceedings of the 38th International Conference on Machine Learning (ICML'21), Virtual Conference, 2021, 4917-4926.
Download: [code] (296Kb)
Consistency regularization for deep peartial label learning
Description: The package includes the source code of a regularized training framework for deep partial label learning, which utilizes an effective regualrization term by involving a conformal label distribution for each instance adaptively inferred by bi-level optimization. A Readme file and some sample files are included in the package.
Reference: D.-D. Wu, D.-B. Wang, M.-L. Zhang. Revisiting consistency regularization for deep partial label learning. In: Proceedings of the 39th International Conference on Machine Learning (ICML'22), Baltimore, MD, 2022, in press.
Download: [code] (25Kb)
Dual perspective of label-specific feature learning
Description: The package includes the source code of DELA, which enables label-specific feature learning for multi-label classification by exploiting non-informative features. A Readme file and some sample files are included in the package.
Reference: J.-Y. Hang, M.-L. Zhang. Dual perspective of label-specific feature learning for multi-label classification. In: Proceedings of the 39th International Conference on Machine Learning (ICML'22), Baltimore, MD, 2022, in press.
Download: [code] (157Kb)
Linear discriminant analysis for partial label learning
Description: The package includes the source code of DELIN, which performs dimensionality reduction for partial label learning by adapting the linear discriminant analysis (LDA) techniques. A Readme file and some sample files are included in the package.
Reference: M.-L. Zhang, J.-H. Wu, W.-X. Bao. Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction. ACM Transactions on Knowledge Discovery from Data, in press.
Download: [code] (922Kb)
Partial label learning with adaptive gragh guided disambiguation
Description: The package includes the source code of PL-AGGD, which learns from partial label data by conducting disambiguation guided by adaptive graph construction. A Readme file and some sample files are included in the package.
Reference: D.-B. Wang, M.-L. Zhang, L. Li. Adaptive graph guided disambiguation for partial label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press.
Download: [code] (917Kb)
Multi-view partial multi-label learning with graph-based disambiguation
Description: The package includes the source code of GRADIS, which deals with multi-view partial multi-label learning problem with graph-based disambiguation. A Readme file is included in the package.
Reference: Z.-S. Chen, X. Wu, Q.-G. Chen, Y. Hu, M.-L. Zhang. Multi-view partial multi-label learning with graph-based disambiguation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI'20), New York, NY, 3553-3560.
Download: [code] (430Kb)
Multi-view partial multi-label learning with feature-induced manifold disambiguation
Description: The package includes the source code of FIMAN, which deals with multi-view partial multi-label learning problem with feature-induced manifold disambiguation. A Readme file is included in the package.
Reference: J.-H. Wu, X. Wu, Q.-G. Chen, Y. Hu, M.-L. Zhang. Feature-induced manifold disambiguation for multi-view partial multi-label learning. In: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20), Virtual Event, CA, 2020, 557-565.
Download: [code] (310Kb)
Semi-supervised partial label learning via confidence-rated margin maximization
Description: The package includes the source code of PARM, which deals with semi-supervised partial label learning via confidence-rated margin maximization. A Readme file is included in the package.
Reference: W. Wang, M.-L. Zhang. Semi-supervised partial label learning via confidence-rated margin maximization. In: Advances in Neural Information Processing Systems 33 (NeurIPS'20), Vancouver, Canada, 2020, 6982-6993.
Download: [code] (921Kb)
Learning from noisy labels with complementary loss functions
Description: The package includes the source code of CompLoss, which learns from noisy labels with complementary loss functions. A Readme file is included in the package.
Reference: D.-B. Wang, Y. Wen, L. Pan, M.-L. Zhang. Learning from noisy labels with complmentary loss functions. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI'21), Virtual Event, 2021, 10111-10119.
Download: [code] (15Kb)
Exploiting unlabeled data via partial label assignment for multi-class semi-supervised learning
Description: The package includes the source code of EUPAL, which solves the multi-class semi-supervised learning problem via partial label assignment. A Readme file is included in the package.
Reference: Z.-R. Zhang, Q.-W. Zhang, Y. Cao, M.-L. Zhang. Exploiting unlabeled data via partial label assignment for multi-class semi-supervised learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI'21), Virtual Event, 2021, 10973-10980.
Download: [code] (852Kb)
Learning from complementary labels via partial-output consistency regularization
Description: The package includes the source code of POCR, which solves the complementary label learning problem via partial-output consistency regularization. A Readme file is included in the package.
Reference: D.-B. Wang, L. Feng, M.-L. Zhang. Learning from complementary labels via partial-output consistency regularization. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI'21), Virtual Conference, 2021, 3075-3081.
Download: [code] (23Kb)
Discriminative complementary-label learning with weighted loss
Description: The package includes the source code of L-W, which solves the complementary label learning problem via discriminative modeling with weighted loss. A Readme file is included in the package.
Reference: Y. Gao, M.-L. Zhang. Discriminative complementary-label learning with weighted loss. In: Proceedings of the 38th International Conference on Machine Learning (ICML'21), Virtual Conference, 2021, 3587-3597.
Download: [code] (5Kb)
Dependence maximization for partial label learning
Description: The package includes the source code of CENDA, which performs dimensionality reduction for partial label learning via confidence-based dependence maximization. A Readme file is included in the package.
Reference: W.-X. Bao, J.-Y. Hang, M.-L. Zhang. Partial label dimensionality reduction via confidence-based dependence maximization. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'21), Virtual Event, Singapore, 46-54.
Download: [code] (919Kb)
Submodular feature selection for partial label learning
Description: The package includes the source code of SAUTE, which performs feature selectionfor partial label learning via submodular mutual information function. A Readme file is included in the package.
Reference: W.-X. Bao, J.-Y. Hang, M.-L. Zhang. Submodular feature selection for partial label learning. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), Washington D. C., 2022, in press.
Download: [code] (1.45Mb)
Partial label learning with discrimination augmentation
Description: The package includes the source code of PLDA, which solves partial label learning problem by augmenting the feature space with confidence-rated class prototype features with good discriminative information. A Readme file is included in the package.
Reference: W. Wang, M.-L. Zhang. Partial label learning with discrimination augmentation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), Washington D. C., 2022, in press.
Download: [code] (11Kb)
[go top]

Other useful softwares:

Relic: a multi-instance version of C4.5 decision tree developed by G. Ruffo [Refer: Ruffo G. Learning single and multiple decision trees for security applications. PhD dissertation, Department of Computer Science, University of Turin, Italy, 2000.]
RipperMI: a multi-instance version of rule learning algorithm Ripper developed by Y. Chevaleyre [Refer: Chevaleyre Y, Zucker J-D. Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: Lecture Notes in Artificial Intelligence 2056, Berlin: Springer, 2001, 204-214.]

BoosTexter: a general purpose machine-learning program based on boosting for building a classifier from text and/or attribute-value data. [Refer: Schapire R E, Singer Y. BoosTexter: a boosting system for text categorization. Machine Learning, 2000, 39(2-3): 136-168.]
ADTBoost.MH: multi-label alternating decision tree construction software developed by F. De Comité et al. [Refer: Comité F D, Gilleron R, Tommasi M. Learning multi-label alternating decision tree from texts and data. In: Lecture Notes in Computer Science 2734, Berlin: Springer, 2003, 35-49.]
[go top]