Recent years have witnessed growing interests in incorporating external knowledge such as pre-trained word embeddings (PWEs) or pre-trained language models (PLMs) into neural topic modeling. However, we found that employing PWEs and PLMs for topic modeling only achieved limited performance improvements but with huge computational overhead. In this paper, we propose a novel strategy to incorporate external knowledge into neural topic modeling where the neural topic model is pre-trained on a large corpus and then fine-tuned on the target dataset. Experiments have been conducted on three datasets and results show that the proposed approach significantly outperforms both current state-of-the-art neural topic models and some topic modeling approaches enhanced with PWEs or PLMs. Moreover, further study shows that the proposed approach greatly reduces the need for the huge size of training data.
@inproceedings{zhang-etal-2022-pre,
title = ""Pre-training and Fine-tuning Neural Topic Model: A Simple yet Effective Approach to Incorporating External Knowledge"",
author = ""Zhang, Linhai and
Hu, Xuemeng and
Wang, Boyu and
Zhou, Deyu and
Zhang, Qian-Wen and
Cao, Yunbo"",
booktitle = ""Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)"",
month = may,
year = ""2022"",
address = ""Dublin, Ireland"",
publisher = ""Association for Computational Linguistics"",
url = ""https://aclanthology.org/2022.acl-long.413"",
doi = ""10.18653/v1/2022.acl-long.413"",
pages = ""5980--5989""}
Medical code prediction from clinical notes aims at automatically associating medical codes with the clinical notes. Rare code problem, the medical codes with low occurrences, is prominent in medical code prediction. Recent studies employ deep neural networks and the external knowledge to tackle it. However, such approaches lack interpretability which is a vital issue in medical application. Moreover, due to the lengthy and noisy clinical notes, such approaches fail to achieve satisfactory results. Therefore, in this paper, we propose a novel framework based on medical concept driven attention to incorporate external knowledge for explainable medical code prediction. In specific, both the clinical notes and Wikipedia documents are aligned into topic space to extract medical concepts using topic modeling. Then, the medical concept-driven attention mechanism is applied to uncover the medical code related concepts which provide explanations for medical code prediction. Experimental results on the benchmark dataset show the superiority of the proposed framework over several state-of-the-art baselines.
@inproceedings{wang-etal-2022-novel,
title = ""A Novel Framework Based on Medical Concept Driven Attention for Explainable Medical Code Prediction via External Knowledge"",
author = ""Wang, Tao and
Zhang, Linhai and
Ye, Chenchen and
Liu, Junxi and
Zhou, Deyu"",
booktitle = ""Findings of the Association for Computational Linguistics: ACL 2022"",
month = may,
year = ""2022"",
address = ""Dublin, Ireland"",
publisher = ""Association for Computational Linguistics"",
url = ""https://aclanthology.org/2022.findings-acl.110"",
doi = ""10.18653/v1/2022.findings-acl.110"",
pages = ""1407--1416""}
Biomedical argument mining aims to automatically identify and extract the argumentative structure in biomedical text. It helps to determine not only what positions people adopt, but also why they hold such opinions, which provides valuable insights into medical decision making. Generally, biomedical argument mining consists of three subtasks: argument component identification, argument component classification and relation identification. Current approaches employ conventional multi-task learning framework for jointly addressing the latter two subtasks, and achieve some successes. However, explicit sequential dependency between these two subtasks is ignored, which is crucial for accurate biomedical argument mining. Moreover, relation identification is conducted solely based on the argument component pair without considering its potentially valuable context. Therefore, in this paper, a novel sequential multi-task learning approach is proposed for biomedical argument mining. Specifically, to model explicit sequential dependency between argument component classification and relation identification, an information transfer strategy is employed to capture the information of argument component types that is transferred to relation identification. Furthermore, graph convolutional network is employed to model dependency relation among the related argument component pairs. The proposed method has been evaluated on a benchmark dataset and the experimental results show that the proposed method outperforms the state-of-the-art methods.
@article{si2022biomedical,
title={Biomedical Argument Mining Based on Sequential Multi-Task Learning},
author={Si, Jiasheng and Sun, Liu and Zhou, Deyu and Ren, Jie and Li, Lin},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
year={2022},
publisher={IEEE}}
Sentiment analysis on user-generated content has achieved notable progress by introducing user information to consider each individual’s preference and language usage. However, most existing approaches ignore the data sparsity problem, where the content of some users is limited and the model fails to capture discriminative features of users. To address this issue, we hypothesize that users could be grouped together based on their rating biases as well as degree of rating consistency and the knowledge learned from groups could be employed to analyze the users with limited data. Therefore, in this paper, a neural group-wise sentiment analysis model with data sparsity awareness is proposed. The user-centred document representations are generated by incorporating a group-based user encoder. Furthermore, a multi-task learning framework is employed to jointly modelusers’ rating biases and their degree of rating consistency. One task is vanilla populationlevel sentiment analysis and the other is groupwise sentiment analysis. Experimental results on three real-world datasets show that the proposed approach outperforms some state-of the-art methods. Moreover, model analysis and case study demonstrate its effectiveness of modeling user rating biases and variances.
@inproceedings{zhou2021neural,
title={A Neural Group-wise Sentiment Analysis Model with Data Sparsity Awareness},
author={Zhou, Deyu and Zhang, Meng and Zhang, Linhai and He, Yulan},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={16},
pages={14594--14601},
year={2021}
}
Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to what have been done in previous approaches which project signals from different modalities onto a homogeneous space. In this paper, we propose a Multimodal Event Representation Learning framework (MERL) to learn event representations based on both text and image modalities simultaneously. Event textual triples are projected as Gaussian density embeddings by a dual-path Gaussian triple encoder, while event images are projected as point embeddings by a visual event component-aware image encoder. Moreover, a novel score function motivated by statistical hypothesis testing is introduced to coordinate two embedding spaces. Experiments are conducted on various multimodal event-related tasks and results show that MERL outperforms a number of unimodal and multimodal baselines, demonstrating the effectiveness of the proposed framework.
@inproceedings{zhang2021merl,
title={MERL: Multimodal event representation learning in heterogeneous embedding spaces},
author={Zhang, Linhai and Zhou, Deyu and He, Yulan and Yang, Zeng},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={16},
pages={14420--14427},
year={2021}
}
Storyline extraction aims to generate concise summaries of related events unfolding over time from a collection of temporally-ordered news articles. Some existing approaches to storyline extraction are typically built on probabilistic graphical models that jointly model the extraction of events and the storylines from news published in different periods. However, their parameter inference procedures are often complex and require a long time to converge, which hinders their use in practical applications. More recently, a neural network-based approach has been proposed to tackle such limitations. However, event representations of documents, which are important for the quality of the generated storylines, are not learned. In this paper, we propose a novel unsupervised neural network-based approach to extract latent events and link patterns of storylines jointly from documents over time. Specifically, event representations are learned by a stacked autoencoder and clustered for event extraction, then a fusion component is incorporated to link the related events across consecutive periods for storyline extraction. The proposed model has been evaluated on three news corpora and the experimental results show that it outperforms state-of-the-art approaches with significant improvements.
@article{si2021unsupervised,
title={Unsupervised latent event representation learning and storyline extraction from news articles based on neural networks},
author={Si, Jiasheng and Guo, Linsen and Zhou, Deyu},
journal={Intelligent Data Analysis},
volume={25},
number={3},
pages={589--603},
year={2021},
publisher={IOS Press}
}
Existing methods for question answering over knowledge bases (KBQA) ignore the consideration of the model prediction uncertainties. We argue that estimating such uncertainties is crucial for the reliability and interpretability of KBQA systems. Therefore, we propose a novel end-to-end KBQA model based on Bayesian Neural Network (BNN) to estimate uncertainties arose from both model and data. To our best knowledge, we are the first to consider the uncertainty estimation problem for the KBQA task using BNN. The proposed end-to-end model integrates entity detection and relation prediction into a unified framework, and employs BNN to model entity and relation under the given question semantics, transforming network weights into distributions. Therefore, predictive distributions can be estimated by sampling weights and forward inputs through the network multiple times. Uncertainties can be further quantified by calculating the variances of predictive distributions. The experimental results demonstrate the effectiveness of uncertainties in both the misclassification detection task and cause of error detection task. Furthermore, the proposed model also achieves comparable performance compared to the existing state-of-the-art approaches on SimpleQuestions dataset.
@article{zhang2021bayesian,
title={A bayesian end-to-end model with estimated uncertainties for simple question answering over knowledge bases},
author={Zhang, Linhai and Lin, Chao and Zhou, Deyu and He, Yulan and Zhang, Meng},
journal={Computer Speech \& Language},
volume={66},
pages={101167},
year={2021},
publisher={Elsevier}
}
Biomedical factoid question answering is an important task in biomedical question answering application. It has attracted much attention because of its reliability of the answer. In question answering system, better representation of word is of much importance and a proper word embedding usually can improve the performance of system significantly. With the success of pre-trained models in general natural language process tasks, pretrained model has been widely used in biomedical area as well and a lot of pretrained model based approaches have been proven effective in biomedical question answering task. Besides the proper word embedding, name entity is also important information for biomedical question answering. Inspired by the concept of transfer learning, in this research we developed a mechanism to finetune BioBERT with name entity dataset to improve the question answering performance.
@article{peng2021named,
title={Named entity aware transfer learning for biomedical factoid question answering},
author={Peng, Keqin and Yin, Chuantao and Rong, Wenge and Lin, Chenghua and Zhou, Deyu and Xiong, Zhang},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
year={2021},
publisher={IEEE}
}
Neural topic models have triggered a surge of interest in extracting topics from text automatically since they avoid the sophisticated derivations in conventional topic models. However, scarce neural topic models incorporate the word relatedness information captured in word embedding into the modeling process. To address this issue, we propose a novel topic modeling approach, called Variational Gaussian Topic Model (VaGTM). Based on the variational auto-encoder, the proposed VaGTM models each topic with a multivariate Gaussian in decoder to incorporate word relatedness. Furthermore, to address the limitation that pre-trained word embeddings of topic-associated words do not follow a multivariate Gaussian, Variational Gaussian Topic Model with Invertible neural Projections (VaGTM-IP) is extended from VaGTM. Three benchmark text corpora are used in experiments to verify the effectiveness of VaGTM and VaGTM-IP. The experimental results show that VaGTM and VaGTM-IP outperform several competitive baselines and obtain more coherent topics.
@article{wang2021variational,
title={Variational Gaussian Topic Model with Invertible Neural Projections},
author={Wang, Rui and Zhou, Deyu and Xiong, Yuxuan and Huang, Haiping},
journal={arXiv preprint arXiv:2105.10095},
year={2021}
}
Lixing Zhu, Gabriele Pergola, Lin Gui, Deyu Zhou, Yulan He.
Topic-driven and knowledge-aware transformer for dialogue emotion detection,
In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing {ACL/IJCNLP} 2021.
Emotion detection in dialogues is challenging as it often requires the identification of thematic topics underlying a conversation, the relevant commonsense knowledge, and the intricate transition patterns between the affective states. In this paper, we propose a Topic-Driven Knowledge-Aware Transformer to handle the challenges above. We firstly design a topic-augmented language model (LM) with an additional layer specialized for topic detection. The topic-augmented LM is then combined with commonsense statements derived from a knowledge base based on the dialogue contextual information. Finally, a transformer-based encoder-decoder architecture fuses the topical and commonsense information, and performs the emotion label sequence prediction. The model has been experimented on four datasets in dialogue emotion detection, demonstrating its superiority empirically over the existing state-of-the-art approaches. Quantitative and qualitative results show that the model can discover topics which help in distinguishing emotion categories.
@inproceedings{DBLP:conf/acl/ZhuP0ZH20,
author = {Lixing Zhu and
Gabriele Pergola and
Lin Gui and
Deyu Zhou and
Yulan He},
editor = {Chengqing Zong and
Fei Xia and
Wenjie Li and
Roberto Navigli},
title = {Topic-Driven and Knowledge-Aware Transformer for Dialogue Emotion
Detection},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on Natural
Language Processing, {ACL/IJCNLP} 2021, (Volume 1: Long Papers), Virtual
Event, August 1-6, 2021},
pages = {1571--1582},
publisher = {Association for Computational Linguistics},
year = {2021},
url = {https://doi.org/10.18653/v1/2021.acl-long.125},
doi = {10.18653/v1/2021.acl-long.125},
timestamp = {Sat, 09 Apr 2022 12:33:46 +0200},
biburl = {https://dblp.org/rec/conf/acl/ZhuP0ZH20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Text generation with auxiliary attributes, such as topics or sentiments, has made remarkable progress. However, high-quality labeled data is difficult to obtain for the large-scale corpus. Therefore, this paper focuses on social emotion ranking aiming to identify social emotions with different intensities evoked by online documents, which could be potentially beneficial for further controlled text generation. Existing studies often consider each document as an entirety that fail to capture the inner relationship between sentences in a document. In this paper, we propose a novel hierarchical state recurrent neural network for social emotion ranking. A hierarchy mechanism is employed to capture the key hierarchical semantic structure in a document. Moreover, instead of incrementally reading a sequence of words or sentences as in traditional recurrent neural networks, the proposed approach encodes the hidden states of all words or sentences simultaneously at each recurrent step to capture long-range dependencies precisely. Experimental results show that the proposed approach performs remarkably better than the state-of-the-art social emotion ranking approaches and is useful for controlled text generation.
@article{zhou2021hierarchical,
title={Hierarchical state recurrent neural network for social emotion ranking},
author={Zhou, Deyu and Zhang, Meng and Yang, Yang and He, Yulan},
journal={Computer Speech \& Language},
volume={68},
pages={101177},
year={2021},
publisher={Elsevier}
}
Diagnosis of major depressive disorder (MDD) using resting-state functional connectivity (rs-FC) data faces many challenges, such as the high dimensionality, small samples, and individual difference. To assess the clinical value of rs-FC in MDD and identify the potential rs-FC machine learning (ML) model for the individualized diagnosis of MDD, based on the rs-FC data, a progressive three-step ML analysis was performed, including six different ML algorithms and two dimension reduction methods, to investigate the classification performance of ML model in a multicentral, large sample dataset [1021 MDD patients and 1100 normal controls (NCs)]. Furthermore, the linear least-squares fitted regression model was used to assess the relationships between rs-FC features and the severity of clinical symptoms in MDD patients. Among used ML methods, the rs-FC model constructed by the eXtreme Gradient Boosting (XGBoost) method showed the optimal classification performance for distinguishing MDD patients from NCs at the individual level (accuracy = 0.728, sensitivity = 0.720, specificity = 0.739, area under the curve = 0.831). Meanwhile, identified rs-FCs by the XGBoost model were primarily distributed within and between the default mode network, limbic network, and visual network. More importantly, the 17 item individual Hamilton Depression Scale scores of MDD patients can be accurately predicted using rs-FC features identified by the XGBoost model (adjusted R2 = 0.180, root mean squared error = 0.946). The XGBoost model using rs-FCs showed the optimal classification performance between MDD patients and HCs, with the good generalization and neuroscientifical interpretability.
@article{2021Multivariate,
title={Multivariate Machine Learning Analyses in Identification of Major Depressive Disorder Using Resting-State Functional Connectivity: A Multicentral Study},
author={ Shi, Y. and Zhang, L. and Wang, Z. and Lu, X. and Zhang, Z. },
journal={ACS Chemical Neuroscience},
volume={12},
number={11},
year={2021},
}
"Objective
Health issue identification in social media is to predict whether the writers have a disease based on their posts. Numerous posts and comments are shared on social media by users. Certain posts may reflect writers' health condition, which can be employed for health issue identification. Usually, the health issue identification problem is formulated as a classification task.
Methods and material
In this paper, we propose novel multi-task hierarchical neural networks with topic attention for identifying health issue based on posts collected from the social media platforms. Specifically, the model incorporates the hierarchical relationship among the document, sentences, and words via bidirectional gated recurrent units (BiGRUs). The global topic information shared across posts is incorporated with the hidden states of BiGRUs to obtain the topic-enhanced attention weights for words. In addition, tasks of predicting whether the writers suffer from a disease (health issue identification) and predicting the specific domain of the posts (domain category classification) are learned jointly in multi-task mechanism.
Results
The proposed method is evaluated on two datasets: dementia issue dataset and depression issue dataset. The proposed approach achieves 98.03% and 88.28% F-1 score on two datasets, outperforming the state-of-the-art approach by 0.73% and 0.4% respectively. Further experimental analysis shows the effectiveness of incorporating both the multi-task learning framework and topic attention mechanism."
@article{zhou2021health,
title={Health issue identification in social media based on multi-task hierarchical neural networks with topic attention},
author={Zhou, Deyu and Yuan, Jiale and Si, Jiasheng},
journal={Artificial Intelligence in Medicine},
volume={118},
pages={102119},
year={2021},
publisher={Elsevier}
}
Image generation from text is the task of generating new images from a textual unit such as word, phase, clause and sentence. It has attracted great attention in both the community of natural language processing and computer vision. Current approaches usually employ an end-to-end framework to tackle the problem. However, we find that the entity information, including categories and attributes of the images, are ignored by most approaches. Such information is crucial for guaranteeing semantic alignment and generating image accurately. For two pictures of the same category, the emphasis of the corresponding text description may be different, but the images generated by these two sentences should have some similarities and the generation process can learn from each other. Therefore, we propose two novel end-to-end frameworks to incorporate entity information in the process of image generation. For the first framework, an image representation is generated from entity labels using the variational inference mechanism and then fused with the representation generated from the corresponding sentence. Instead of fusing the images in high-dimensional space, images are inferred and fused in the latent space (the low-dimensional space) in the second framework, where computationally intensive upsampling modules are shared. Moreover, a novel metric (Entity Matching Score) is proposed to measure the degree of consistency of the generated image with its corresponding text description and the effectiveness of the metric has been proved by the generated samples in our experiments. Experimental results show that both the proposed frameworks outperform some state-of-the-art approaches significantly on two benchmark datasets.
@article{zhou2021image,
title={Image generation from text with entity information fusion},
author={Zhou, Deyu and Sun, Kai and Hu, Mingqi and He, Yulan},
journal={Knowledge-Based Systems},
volume={227},
pages={107200},
year={2021},
publisher={Elsevier}
}
Relation detection in knowledge base question answering, aims to identify the path (s) of relations starting from the topic entity node that is linked to the answer node in knowledge graph. Such path might consist of multiple relations, which we call multi-hop. Moreover, for a single question, there may exist multiple relation paths to the correct answer, which we call multi-label. However, most of existing approaches only detect one single path to obtain the answer without considering other correct paths, which might affect the final performance. Therefore, in this paper, we propose a novel divide-and-conquer approach for multi-label multi-hop relation detection (DC-MLMH) by decomposing it into head relation detection and conditional relation path generation. In specific, a novel path sampling mechanism is proposed to generate diverse relation paths for the inference stage. A majority-vote policy is employed to detect final KB answer. Comprehensive experiments were conducted on the FreebaseQA benchmark dataset. Experimental results show that the proposed approach not only outperforms other competitive multi-label baselines, but also has superiority over some state-of-art KBQA methods.
@inproceedings{zhou-etal-2021-divide-conquer,
title = ""A Divide-And-Conquer Approach for Multi-label Multi-hop Relation Detection in Knowledge Base Question Answering"",
author = ""Zhou, Deyu and
Xiang, Yanzheng and
Zhang, Linhai and
Ye, Chenchen and
Zhang, Qian-Wen and
Cao, Yunbo"",
booktitle = ""Findings of the Association for Computational Linguistics: EMNLP 2021"",
month = nov,
year = ""2021"",
address = ""Punta Cana, Dominican Republic"",
publisher = ""Association for Computational Linguistics"",
url = ""https://aclanthology.org/2021.findings-emnlp.412"",
doi = ""10.18653/v1/2021.findings-emnlp.412"",
pages = ""4798--4808""
}
Multi-hop relation detection in Knowledge Base Question Answering (KBQA) aims at retrieving the relation path starting from the topic entity to the answer node based on a given question, where the relation path may comprise multiple relations. Most of the existing methods treat it as a single-label learning problem while ignoring the fact that for some complex questions, there exist multiple correct relation paths in knowledge bases. Therefore, in this paper, multi-hop relation detection is considered as a multi-label learning problem. However, performing multi-label multi-hop relation detection is challenging since the numbers of both the labels and the hops are unknown. To tackle this challenge, multi-label multi-hop relation detection is formulated as a sequence generation task. A relation-aware sequence relation generation model is proposed to solve the problem in an end-to-end manner. Experimental results show the effectiveness of the proposed method for relation detection and KBQA.
@inproceedings{zhang-etal-2021-multi-label-multi,
title = ""A Multi-label Multi-hop Relation Detection Model based on Relation-aware Sequence Generation"",
author = ""Zhang, Linhai and
Zhou, Deyu and
Lin, Chao and
He, Yulan"",
booktitle = ""Findings of the Association for Computational Linguistics: EMNLP 2021"",
month = nov,
year = ""2021"",
address = ""Punta Cana, Dominican Republic"",
publisher = ""Association for Computational Linguistics"",
url = ""https://aclanthology.org/2021.findings-emnlp.404"",
doi = ""10.18653/v1/2021.findings-emnlp.404"",
pages = ""4713--4719""}
Multi-label document classification, associating one document instance with a set of relevant labels, is attracting more and more research attention. Existing methods explore the incorporation of information beyond text, such as document metadata or label structure. These approaches however either simply utilize the semantic information of metadata or employ the predefined parent-child label hierarchy, ignoring the heterogeneous graphical structures of metadata and labels, which we believe are crucial for accurate multi-label document classification. Therefore, in this paper, we propose a novel neural network based approach for multi-label document classification, in which two heterogeneous graphs are constructed and learned using heterogeneous graph transformers. One is metadata heterogeneous graph, which models various types of metadata and their topological relations. The other is label heterogeneous graph, which is constructed based on both the labels’ hierarchy and their statistical dependencies. Experimental results on two benchmark datasets show the proposed approach outperforms several state-of-the-art baselines.
@inproceedings{ye-etal-2021-beyond,
title = ""Beyond Text: Incorporating Metadata and Label Structure for Multi-Label Document Classification using Heterogeneous Graphs"",
author = ""Ye, Chenchen and
Zhang, Linhai and
He, Yulan and
Zhou, Deyu and
Wu, Jie"",
booktitle = ""Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing"",
month = nov,
year = ""2021"",
address = ""Online and Punta Cana, Dominican Republic"",
publisher = ""Association for Computational Linguistics"",
url = ""https://aclanthology.org/2021.emnlp-main.253"",
doi = ""10.18653/v1/2021.emnlp-main.253"",
pages = ""3162--3171""
}
Implicit sentiment analysis, aiming at detecting the sentiment of a sentence without sentiment words, has become an attractive research topic in recent years. In this paper, we focus on event-centric implicit sentiment analysis that utilizes the sentiment-aware event contained in a sentence to infer its sentiment polarity. Most existing methods in implicit sentiment analysis simply view noun phrases or entities in text as events or indirectly model events with sophisticated models. Since events often trigger sentiments in sentences, we argue that this task would benefit from explicit modeling of events and event representation learning. To this end, we represent an event as the combination of its event type and the event triplet< subject, predicate, object>. Based on such event representation, we further propose a novel model with hierarchical tensor-based composition mechanism to detect sentiment in text. In addition, we present a dataset for event-centric implicit sentiment analysis where each sentence is labeled with the event representation described above. Experimental results on our constructed dataset and an existing benchmark dataset show the effectiveness of the proposed approach.
@inproceedings{zhou-etal-2021-implicit,
title = ""Implicit Sentiment Analysis with Event-centered Text Representation"",
author = ""Zhou, Deyu and
Wang, Jianan and
Zhang, Linhai and
He, Yulan"",
booktitle = ""Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing"",
month = nov,
year = ""2021"",
address = ""Online and Punta Cana, Dominican Republic"",
publisher = ""Association for Computational Linguistics"",
url = ""https://aclanthology.org/2021.emnlp-main.551"",
doi = ""10.18653/v1/2021.emnlp-main.551"",
pages = ""6884--6893""}
"Background
Objective biomarkers are crucial for overcoming the clinical dilemma in major depressive disorder (MDD), and the individualized diagnosis is essential to facilitate the precise medicine for MDD.
Methods
Sleep disturbance-related magnetic resonance imaging (MRI) features was identified in the internal dataset (92 MDD patients) using the relevance vector regression algorithm, which was further verified in 460 MDD patients of an independent, multicenter dataset. Subsequently, using these MRI features, the eXtreme Gradient Boosting classification model was constructed in the current multicenter dataset (460 MDD patients and 470 normal controls). Meanwhile, the association between classification outputs and the severity of depressive symptoms was also investigated.
Results
In MDD patients, the combination of gray matter density and fractional amplitude of low-frequency fluctuation can accurately predict individual sleep disturbance score that was calculated by the sum of item 4 score, item 5 score, and item 6 score of the 17-Item Hamilton Rating Scale for Depression (HAMD-17) (R2 = 0.158 in the internal dataset; R2 = 0.110 in multicenter dataset). Furthermore, the classification model based on these MRI features distinguished MDD patients from normal controls with 86.3% accuracy (area under the curve = 0.937). Importantly, the classification outputs significantly correlated with HAMD-17 scores in MDD patients.
Limitation
Lacking some specialized tools to assess the personal sleep quality, e.g. Pittsburgh Sleep Quality Index.
Conclusion
Neuroimaging features can reflect accurately individual sleep disturbance manifestation and serve as potential diagnostic biomarkers of MDD."
@article{shi2021sleep,
title={Sleep disturbance-related neuroimaging features as potential biomarkers for the diagnosis of major depressive disorder: A multicenter study based on machine learning},
author={Shi, Yachen and Zhang, Linhai and He, Cancan and Yin, Yingying and Song, Ruize and Chen, Suzhen and Fan, Dandan and Zhou, Deyu and Yuan, Yonggui and Xie, Chunming and others},
journal={Journal of Affective Disorders},
volume={295},
pages={148--155},
year={2021},
publisher={Elsevier}
}
Topic models have been widely used to mine hidden topics from documents.
However, one limitation of such topic models is that they are prone to generate incoherent topics.
To address this limitation, many approaches have been proposed to incorporate the prior knowledge of word semantic relatedness into the topic inference process.
One example is the Generalized Polya Urn (GPU) scheme.
However, GPU- based topic models often require sophisticated algorithms to acquire domain-specific knowledge from data.
Moreover, prior knowledge is incorporated into the topic inference process without considering its impact on the intermediate topic sampling results.
In this paper, we propose a novel Weighted Polya Urn scheme and incorporate it into Latent Dirichlet Allocation framework to build the self-enhancement topic model and generate coherent topics.
In specific, semantic prior knowledge based on word embedding is employed to measure the semantic coherence of a word to different topics, which is incorporated into the Weighted Polya Urn scheme.
Moreover, semantic coherence is updated dynamically based on the semantic similarity between a word and the representative words in different topics.
Experiments have been conducted on seven public corpora from different domains to evaluate the effectiveness of the proposed approach.
Experimental results show that compared to the state-of-the-art baselines, the proposed approach can generate more coherent topics.
@article{wang2019optimising,
title={Optimising Topic Coherence with Weighted Po{\'{}} lya Urn scheme},
author={Wang, Rui and Zhou, Deyu and He, Yulan},
journal={Neurocomputing},
year={2019},
publisher={Elsevier}
}
Background: Diabetes has significant effects on bone metabolism. Both type 1 and type 2 diabetes
can cause osteoporotic fracture. However, it remains challenging to diagnose osteoporosis in type
2 diabetes by bone mineral density which lacks regular changes. Seen another way, osteoporosis
can be ascribed to the imbalance of bone metabolism, which is closely related to diabetes as well.
Method: Here, to assist clinicians in diagnosing osteoporosis in type 2 diabetes, an efficient and
simple SVM model was established based on different combinations of biochemical indices,
including bone turnover makers, calcium and phosphorus, etc. The classification performance was
measured using several evaluations. Results: The predicting accuracy rate of final model is above
88%, with feature combination of Sex, Age, BMI, TP1NP and OSTEOC. Conclusion:
Experimental results show that the model has come to an anticipant result for early detection and
daily monitoring on type 2 diabetic osteoporosis.
@article{sun2020bone,
title={Bone Metabolic Biomarkers-Based Diagnosis of Osteoporosis Caused by Diabetes Mellitus using Support Vector Machine},
author={Sun, J and Wang, C and Zhang, T and Liu, X and Miao, L and Zhou, D and Wang, P and Zhang, Y and Jiang, Q and Hu, Y and others},
year={2020}
}
Social media platforms allow users to express their opinions towards various topics online.
Oftentimes, users' opinions are not static, but might be changed over time due to the influences
from their neighbors in social networks or updated based on arguments encountered that undermine their beliefs. In this paper, we propose to use a Recurrent Neural Network (RNN) to
model each user's posting behaviors on Twitter and incorporate their neighbors' topic-associated
context as attention signals using an attention mechanism for user-level stance prediction.
Moreover, our proposed model operates in an online setting in that its parameters are continuously updated with the Twitter stream data and can be used to predict user's topic-dependent
stance. Detailed evaluation on two Twitter datasets, related to Brexit and US General Election,
justifies the superior performance of our neural opinion dynamics model over both static and
dynamic alternatives for user-level stance prediction.
@article{zhu2019neural,
title={Neural opinion dynamics model for the prediction of user-level stance dynamics},
author={Zhu, Lixing and He, Yulan and Zhou, Deyu},
journal={Information Processing \& Management},
year={2019},
publisher={Elsevier}
}
Rui Wang, Xuemeng Hu, Deyu Zhou, Yulan He, Yuxuan Xiong, Chenchen Ye, Haiyang Xu. Neural Topic Modeling with Bidirectional Adversarial Training, In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL2020), 2020.
Recent years have witnessed a surge of interests of using neural topic models for automatic
topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA).
However, these models either typically assume improper prior (e.g. Gaussian or Logistic Normal) over latent topic space or could not infer
topic distribution for a given document.
To address these limitations, we propose a neural topic modeling approach, called Bidirectional Adversarial Topic (BAT) model, which
represents the first attempt of applying bidirectional adversarial training for neural topic
modeling. The proposed BAT builds a twoway projection between the document-topic
distribution and the document-word distribution. It uses a generator to capture the semantic patterns from texts and an encoder
for topic inference. Furthermore, to incorporate word relatedness information, the Bidirectional Adversarial Topic model with Gaussian
(Gaussian-BAT) is extended from BAT. To verify the effectiveness of BAT and GaussianBAT, three benchmark corpora are used in our
experiments. The experimental results show
that BAT and Gaussian-BAT obtain more coherent topics, outperforming several competitive baselines. Moreover, when performing
text clustering based on the extracted topics,
our models outperform all the baselines, with
more significant improvements achieved by
Gaussian-BAT where an increase of near 6%
is observed in accuracy.
@article{wang2020neural,
title={Neural Topic Modeling with Bidirectional Adversarial Training},
author={Wang, Rui and Hu, Xuemeng and Zhou, Deyu and He, Yulan and Xiong, Yuxuan and Ye, Chenchen and Xu, Haiyang},
journal={arXiv preprint arXiv:2004.12331},
year={2020}
}
Opinion prediction on Twitter is challenging
due to the transient nature of tweet content
and neighbourhood context. In this paper, we
model users' tweet posting behaviour as a temporal point process to jointly predict the posting time and the stance label of the next tweet
given a user's historical tweet sequence and
tweets posted by their neighbours. We design
a topic-driven attention mechanism to capture
the dynamic topic shifts in the neighbourhood
context. Experimental results show that the
proposed model predicts both the posting time
and the stance labels of future tweets more accurately compared to a number of competitive
baselines.
@article{zhu2020neural,
title={Neural Temporal Opinion Modelling for Opinion Prediction on Twitter},
author={Zhu, Lixing and He, Yulan and Zhou, Deyu},
journal={arXiv preprint arXiv:2005.13486},
year={2020}
}
Storyline generation aims to produce a concise summary of related events unfolding over time from a collection of news articles. It can be cast into an evolutionary clustering problem by separating news articles into different epochs. Existing unsupervised approaches to storyline generation are typically based on probabilistic graphical models. They assume that the storyline distribution at the current epoch depends on the weighted combination of storyline distributions in the latest previous M epochs. The evolutionary parameters of such long-term dependency are typically set by a fixed exponential decay function to capture the intuition that events in more recent epochs have stronger influence to the storyline generation in the current epoch. However, we argue that the amount of relevant historical contextual information should vary for different storylines. Therefore, in this paper, we propose a new Dynamic Dependency Storyline Extraction Model (D2SEM) in which the dependencies among events in different epochs but belonging to the same storyline are dynamically updated to track the time-varying distributions of storylines over time. The proposed model has been evaluated on three news corpora and the experimental results show that it outperforms the state-of-the-art approaches and is able to capture the dependency on historical contextual information dynamically.
@article{guo2020storyline,
title={Storyline extraction from news articles with dynamic dependency},
author={Guo, Linsen and Zhou, Deyu and He, Yulan and Xu, Haiyang},
journal={Intelligent Data Analysis},
volume={24},
number={1},
pages={183--197},
year={2020},
publisher={IOS Press}
}
Advances on deep generative models have attracted significant research interest in neural topic modeling. The recently proposed Adversarial-neural Topic Model models topics with an adversarially trained generator network and employs Dirichlet prior to capture the semantic patterns in latent topics. It is effective in discovering coherent topics but unable to infer topic distributions for given documents or utilize available document labels. To overcome such limitations, we propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT. ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics. Adversarial training and cycle-consistent constraints are used to encourage the generator and the encoder to produce realistic samples that coordinate with each other. sToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics. The effectiveness of the proposed models is evaluated on unsupervised/supervised topic modeling and text classification. The experimental results show that our models can produce both coherent and informative topics, outperforming a number of competitive baselines.
@article{hu2020neural,
title={Neural topic modeling with cycle-consistent adversarial training},
author={Hu, Xuemeng and Wang, Rui and Zhou, Deyu and Xiong, Yuxuan},
journal={arXiv preprint arXiv:2009.13971},
year={2020}
}
Graph Neural Networks (GNNs) that capture the relationships between graph nodes via message passing have been a hot research direction in the natural language processing community. In this paper, we propose Graph Topic Model (GTM), a GNN based neural topic model that represents a corpus as a document relationship graph. Documents and words in the corpus become nodes in the graph and are connected based on document-word co-occurrences. By introducing the graph structure, the relationships between documents are established through their shared words and thus the topical representation of a document is enriched by aggregating information from its neighboring nodes using graph convolution. Extensive experiments on three datasets were conducted and the results demonstrate the effectiveness of the proposed approach.
@article{zhou2020neural,
title={Neural topic modeling by incorporating document relationship graph},
author={Zhou, Deyu and Hu, Xuemeng and Wang, Rui},
journal={arXiv preprint arXiv:2009.13972},
year={2020}
}
Topic models are widely used for thematic structure discovery in text. But
traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address these limitations, we propose a topic modeling approach
based on Generative Adversarial Nets (GANs), called Adversarial-neural Topic
Model (ATM). The proposed ATM models topics with Dirichlet prior and employs
a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. To
illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. Our experimental results on the two
public corpora show that ATM generates more coherence topics, outperforming
a number of competitive baselines. Moreover, ATM is able to extract meaningful
events from news articles.
Biomedical event extraction plays an important role in the extraction of biological information from large-scale scientific
publications. However, most state-of-the-art systems separate this task into several steps, which leads to cascading errors. In addition,
it is complicated to generate features from syntactic and dependency analysis separately. Therefore, in this paper, we propose an
end-to-end model based on long short-term memory (LSTM) to optimize biomedical event extraction. Experimental results
demonstrate that our approach improves the performance of biomedical event extraction. We achieve average F1-scores of 59.68%,
58.23% and 57.39% on the BioNLP09, BioNLP11 and BioNLP13's Genia event datasets, respectively. The experimental study has
shown our proposed model's potential in biomedical event extraction.
@article{yu2019lstm,
title={LSTM-Based End-to-End Framework for Biomedical Event Extraction},
author={Yu, Xinyi and Rong, Wenge and Liu, Jingshuang and Zhou, Deyu and Ouyang, Yuanxin and Xiong, Zhang},
journal={IEEE/ACM transactions on computational biology and bioinformatics},
year={2019},
publisher={IEEE}
}
The objective of the study was to explore the potential value of plasma indicators for identifying amnesic mild cognitive impairment (aMCI) and determine whether levels of plasma indicators are related to the performance of cognitive function and brain tissue volumes. In total, 155 participants (68 aMCI patients and 87 health controls) were recruited in the present cross-sectional study. The levels of plasma amyloid-β (Aβ) 40, Aβ42, total tau (t-tau), and neurofilament light (NFL) were measured using an ultrasensitive quantitative method. Machine learning algorithms were performed for establishing an optimal model of identifying aMCI. Compared with healthy controls, Aβ40 and Aβ42 levels were lower and NFL levels were higher in plasma of aMCI patients with an exception of t-tau levels. In aMCI patients, the higher plasma Aβ40 levels were correlated with the impaired episodic memory and negative correlations were observed between plasma t-tau levels and global cognitive function and gray matter (GM) volume. In addition, the higher plasma NFL levels were correlated with reduced hippocampus volume and total GM volume of the left inferior and middle temporal gyrus. An integrated model included clinical features, hippocampus volume, and plasma Aβ42 and NFL and had the highest accuracy for detecting aMCI patients (accuracy, 74.2%). We demonstrated that plasma Aβ40, Aβ42, t-tau, and NFL may be useful to identify aMCI and correlate with cognitive decline and brain atrophy. Among these plasma indicators, Aβ42 and NFL are more valuable as key members of a peripheral biomarker panel to detect aMCI.
@article{shi2019potential,
title={Potential value of plasma amyloid-$\beta$, total tau, and neurofilament light for identification of early Alzheimer’s disease},
author={Shi, Yachen and Lu, Xiang and Zhang, Linhai and Shu, Hao and Gu, Lihua and Wang, Zan and Gao, Lijuan and Zhu, Jianli and Zhang, Haisan and Zhou, Deyu and others},
journal={ACS Chemical Neuroscience},
volume={10},
number={8},
pages={3479--3485},
year={2019},
publisher={ACS Publications}
}
With the prevalence of social media and online forum, opinion mining, aiming at analyzing and
discovering the latent opinion in user-generated reviews on the Internet, has become a hot research topic.
This survey focuses on two important subtasks in this field, stance detection and product aspect mining,
both of which can be formalized as the problem of the triple htarget, aspect, opinioni extraction. In this
paper, we first introduce the general framework of opinion mining and describe the evaluation metrics.
Then, the methodologies for stance detection on different sources, such as online forum and social media are
discussed. After that, approaches for product aspect mining are categorized into three main groups which
are corpus level aspect extraction, corpus level aspect, and opinion mining, and document level aspect and
opinion mining based on the processing units and tasks. And then we discuss the challenges and possible
solutions. Finally, we summarize the evolving trend of the reviewed methodologies and conclude the survey.
@article{wang2019survey,
title={A Survey on Opinion Mining: From Stance to Product Aspect},
author={Wang, Rui and Zhou, Deyu and Jiang, Mingmin and Si, Jiasheng and Yang, Yang},
journal={IEEE Access},
volume={7},
pages={41101--41124},
year={2019},
publisher={IEEE}
}
When users express their stances towards a topic in social media, they might elaborate their viewpoints
or reasoning. Oftentimes, viewpoints expressed by different users exhibit a hierarchical structure. Therefore, detecting this kind of hierarchical viewpoints offers a better insight to understand the public opinion. In this paper, we propose a novel Bayesian model for hierarchical viewpoint discovery from tweets.
Driven by the motivation that a viewpoint expressed in a tweet can be regarded as a path from the
root to a leaf of a hierarchical viewpoint tree, the assignment of the relevant viewpoint topics is assumed to follow two nested Chinese restaurant processes. Moreover, opinions in text are often expressed
in un-semantically decomposable multi-terms or phrases, such as 'economic recession'. Hence, a hierarchical Pitman-Yor process is employed as a prior for modelling the generation of phrases with arbitrary length. Experimental results on two Twitter corpora demonstrate the effectiveness of the proposed
Bayesian model for hierarchical viewpoint discovery.
@article{zhu2019hierarchical,
title={Hierarchical viewpoint discovery from tweets using Bayesian modelling},
author={Zhu, Lixing and He, Yulan and Zhou, Deyu},
journal={Expert Systems with Applications},
volume={116},
pages={430--438},
year={2019},
publisher={Elsevier}
}
With the fast development of online social platforms, social emotion detection,
focusing on predicting readers' emotions evoked by news articles, has been intensively investigated. Considering emotions as latent variables, various probabilistic graphical models have been proposed for emotion detection. However,
the bag-of-words assumption prohibits those models from capturing the interrelations between sentences in a document. Moreover, existing models can only
detect emotions at either the document-level or the sentence-level. In this paper,
we propose an effective Bayesian model, called hidden Topic-Emotion Transition
model, by assuming that words in the same sentence share the same emotion
and topic and modelling the emotions and topics in successive sentences as a
Markov chain. By doing so, not only the document-level emotion but also the
sentence-level emotion can be detected simultaneously. Experimental results on
the two public corpora show that the proposed model outperforms state-of-theart approaches on both document-level and sentence-level emotion detection.
@article{tang2019hidden,
title={Hidden topic--emotion transition model for multi-level social emotion detection},
author={Tang, Donglei and Zhang, Zhikai and He, Yulan and Lin, Chao and Zhou, Deyu},
journal={Knowledge-Based Systems},
volume={164},
pages={426--435},
year={2019},
publisher={Elsevier}
}
To extract the structured representations of
open-domain events, Bayesian graphical models have made some progress. However, these
approaches typically assume that all words in
a document are generated from a single event.
While this may be true for short text such as
tweets, such an assumption does not generally
hold for long text such as news articles. Moreover, Bayesian graphical models often rely on
Gibbs sampling for parameter inference which
may take long time to converge. To address
these limitations, we propose an event extraction model based on Generative Adversarial
Nets, called Adversarial-neural Event Model
(AEM). AEM models an event with a Dirichlet prior and uses a generator network to capture the patterns underlying latent events. A
discriminator is used to distinguish documents
reconstructed from the latent events and the
original documents. A byproduct of the discriminator is that the features generated by the
learned discriminator network allow the visualization of the extracted events. Our model
has been evaluated on two Twitter datasets and
a news article dataset. Experimental results
show that our model outperforms the baseline
approaches on all the datasets, with more significant improvements observed on the news
article dataset where an increase of 15% is observed in F-measure.
@article{wang2019open,
title={Open Event Extraction from Online Text using a Generative Adversarial Network},
author={Wang, Rui and Zhou, Deyu and He, Yulan},
journal={arXiv preprint arXiv:1908.09246},
year={2019}
}
Yang Yang, Deyu Zhou, Yulan He, Meng Zhang. Interpretable Relevant Emotion Ranking with Event-Driven Attention, In: Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, November 3-7, 2019
Multiple emotions with different intensities
are often evoked by events described in documents.
Oftentimes, such event information is
hidden and needs to be discovered from texts.
Unveiling the hidden event information can
help to understand how the emotions are evoked
and provide explainable results. However,
existing studies often ignore the latent
event information. In this paper, we proposed
a novel interpretable relevant emotion ranking
model with the event information incorporated
into a deep learning architecture using
the event-driven attentions. Moreover, corpuslevel
event embeddings and document-level
event distributions are introduced respectively
to consider the global events in corpus and the
document-specific events simultaneously. Experimental
results on three real-world corpora
show that the proposed approach performs remarkably
better than the state-of-the-art emotion
detection approaches and multi-label approaches.
Moreover, interpretable results can
be obtained to shed light on the events which
trigger certain emotions.
In this paper, we propose a novel variational generator framework for conditional GANs
to catch semantic details for improving the generation quality and diversity. Traditional
generators in conditional GANs simply concatenate the conditional vector with the noise as
the input representation, which is directly employed for upsampling operations. However,
the hidden condition information is not fully exploited, especially when the input is a
class label. Therefore, we introduce a variational inference into the generator to infer the
posterior of latent variable only from the conditional input, which helps achieve a variable
augmented representation for image generation. Qualitative and quantitative experimental
results show that the proposed method outperforms the state-of-the-art approaches and
achieves the realistic controllable images.
@article{hu2019variational,
title={Variational Conditional GAN for Fine-grained Controllable Image Generation},
author={Hu, Mingqi and Zhou, Deyu and He, Yulan},
journal={arXiv preprint arXiv:1909.09979},
year={2019}
}
Weakly supervised part-of-speech (POS) tagging is to learn to predict the POS tag for a given word in context
by making use of partial annotated data instead of the fully tagged corpora. Weakly supervised POS tagging
would benefit various natural language processing applications in such languages where tagged corpora are
mostly unavailable.
In this article, we propose a novel framework for weakly supervised POS tagging based on a dictionary of
words with their possible POS tags. In the constrained error-correcting output codes (ECOC)-based approach,
a unique L-bit vector is assigned to each POS tag. The set of bitvectors is referred to as a coding matrix with
value {1, -1}. Each column of the coding matrix specifies a dichotomy over the tag space to learn a binary
classifier. For each binary classifier, its training data is generated in the following way: each pair of words
and its possible POS tags are considered as a positive training example only if the whole set of its possible
tags falls into the positive dichotomy specified by the column coding and similarly for negative training
examples. Given a word in context, its POS tag is predicted by concatenating the predictive outputs of the L
binary classifiers and choosing the tag with the closest distance according to some measure. By incorporating
the ECOC strategy, the set of all possible tags for each word is treated as an entirety without the need of
performing disambiguation. Moreover, instead of manual feature engineering employed in most previous POS
tagging approaches, features for training and testing in the proposed framework are automatically generated
using neural language modeling. The proposed framework has been evaluated on three corpora for English,
Italian, and Malagasy POS tagging, achieving accuracies of 93.21%, 90.9%, and 84.5% individually, which shows
a significant improvement compared to the state-of-the-art approaches.
@article{zhou2018weakly,
title={Weakly Supervised POS Tagging without Disambiguation},
author={Zhou, Deyu and Zhang, Zhikai and Zhang, Min-Ling and He, Yulan},
journal={ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)},
volume={17},
number={4},
pages={35},
year={2018},
publisher={ACM}
}
Objective: A drug-drug interaction (DDI) is a situation in which a drug affects the activity of another drug
synergistically or antagonistically when being administered together. The information of DDIs is crucial
for healthcare professionals to prevent adverse drug events. Although some known DDIs can be found
in purposely-built databases such as DrugBank, most information is still buried in scientific publications.
Therefore, automatically extracting DDIs from biomedical texts is sorely needed.
Methods and material: In this paper, we propose a novel position-aware deep multi-task learning approach
for extracting DDIs from biomedical texts. In particular, sentences are represented as a sequence of
word embeddings and position embeddings. An attention-based bidirectional long short-term memory
(BiLSTM) network is used to encode each sentence. The relative position information of words with the
target drugs in textis combined with the hidden states of BiLSTM to generate the position-aware attention
weights. Moreover, the tasks of predicting whether or not two drugs interact with each other and further
distinguishing the types of interactions are learned jointly in multi-task learning framework.
Results: The proposed approach has been evaluated on the DDIExtraction challenge 2013 corpus and
the results show that with the position-aware attention only, our proposed approach outperforms the
state-of-the-art method by 0.99% for binary DDI classification, and with both position-aware attention
and multi-task learning, our approach achieves a micro F-score of 72.99% on interaction type identification, outperforming the state-of-the-art approach by 1.51%, which demonstrates the effectiveness of the
proposed approach.
@article{zhou2018position,
title={Position-aware deep multi-task learning for drug--drug interaction extraction},
author={Zhou, Deyu and Miao, Lei and He, Yulan},
journal={Artificial intelligence in medicine},
volume={87},
pages={1--8},
year={2018},
publisher={Elsevier}
}
Text might express or evoke multiple emotions with varying intensities. As such, it is crucial to predict and rank multiple relevant emotions by their intensities. Moreover, as emotions might be evoked by hidden topics, it is
important to unveil and incorporate such topical information to understand how the emotions are evoked. We proposed a novel interpretable neural network approach for relevant emotion ranking. Specifically, motivated by
transfer learning, the neural network is initialized to make the hidden layer approximate the
behavior of topic models. Moreover, a novel error function is defined to optimize the whole neural network for relevant emotion ranking. Experimental results on three real-world
corpora show that the proposed approach performs remarkably better than the state-of-theart emotion detection approaches and multilabel learning methods. Moreover, the extracted emotion-associated topic words indeed represent emotion-evoking events and are in line
with our common-sense knowledge.
@inproceedings{yang2018interpretable,
title={An interpretable neural network with topical information for relevant emotion ranking},
author={Yang, Yang and Deyu, ZHOU and He, Yulan},
booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
pages={3423--3432},
year={2018}
}
Text might contain or invoke multiple emotions with varying intensities. As such, emotion detection, to predict multiple emotions associated with a given text, can be cast into a
multi-label classification problem. We would
like to go one step further so that a ranked list
of relevant emotions are generated where top
ranked emotions are more intensely associated with text compared to lower ranked emotions, whereas the rankings of irrelevant emotions are not important. A novel framework of
relevant emotion ranking is proposed to tackle
the problem. In the framework, the objective
loss function is designed elaborately so that
both emotion prediction and rankings of only relevant emotions can be achieved. Moreover, we observe that some emotions co-occur
more often while other emotions rarely coexist. Such information is incorporated into
the framework as constraints to improve the
accuracy of emotion detection. Experimental
results on two real-world corpora show that
the proposed framework can effectively deal
with emotion detection and performs remarkably better than the state-of-the-art emotion
detection approaches and multi-label learning
methods.
@inproceedings{zhou2018relevant,
title={Relevant emotion ranking from text constrained with emotion relationships},
author={Zhou, Deyu and Yang, Yang and He, Yulan},
booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
pages={561--571},
year={2018}
}
Storyline generation aims to extract events described on news articles under a certain topic and reveal how those events evolve over
time. Most existing approaches first train supervised models to extract events from news
articles published in different time periods and
then link relevant events into coherent stories.
They are domain dependent and cannot deal
with unseen event types. To tackle this problem, approaches based on probabilistic graphic
models jointly model the generations of events
and storylines without annotated data. However, the parameter inference procedure is too
complex and models often require long time
to converge. In this paper, we propose a novel neural network based approach to extract structured representations and evolution patterns of storylines without using annotated data. In this model, title and main body of a
news article are assumed to share the similar
storyline distribution. Moreover, similar documents described in neighboring time periods
are assumed to share similar storyline distributions. Based on these assumptions, structured
representations and evolution patterns of storylines can be extracted. The proposed model has been evaluated on three news corpora
and the experimental results show that it outperforms state-of-the-art approaches accuracy
and efficiency.
@inproceedings{zhou2018neural,
title={Neural Storyline Extraction Model for Storyline Generation from News Articles},
author={Zhou, Deyu and Guo, Linshen and He, Yulan},
year={2018},
organization={Association for Computational Linguistics}
}
In recent years, there have been increasing interests in using unsupervised models to extract structured representations of
newsworthy events from Twitter. These
models typically assume that tweets involving the same named entities and expressed using similar words are likely to
belong to the same event. Hence, they
group tweets into clusters based on the cooccurrence patterns of named entities and
topical keywords. However, there are two main limitations. First, they require
the number of events to be known beforehand, which is not realistic in practical applications. Second, they don't recognise
that the same named entity might be referred to by multiple mentions, for example, "Putin" and "The President of Russia" refer to the same person. As a results, tweets using different mentions would
be wrongly assigned to different events.
To overcome these limitations, we propose
a non-parametric Bayesian mixture model with word embeddings for event extraction, in which the number of events can be
inferred automatically and the issue of lexical variations for the same named entity
can be dealt with properly. Our model has
been evaluated on three datasets with sizes
ranging between 2,499 and over 60 million
tweets. Experimental results show that our
model outperforms the baseline approach
on all datasets by 5-8% in F-measure.
@inproceedings{zhou2017event,
title={Event extraction from Twitter using non-parametric Bayesian mixture model with word embeddings},
author={Zhou, Deyu and Zhang, Xuan and He, Yulan},
booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
pages={808--817},
year={2017}
}
2016
Deyu Zhou, Xuan Zhang, Yin Zhou, Quan Zhao, Xin Geng. Emotion Distribution Learning from Texts, In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), Austin, Texas, USA, November 1-5, 2016.
The advent of social media and its prosperity
enable users to share their opinions and views.
Understanding users' emotional states might
provide the potential to create new business
opportunities. Automatically identifying users' emotional states from their texts and classifying emotions into finite categories such
as joy, anger, disgust, etc., can be considered as a text classification problem. However, it introduces a challenging learning scenario where multiple emotions with different intensities are often found in a single sentence. Moreover, some emotions co-occur
more often while other emotions rarely coexist. In this paper, we propose a novel approach based on emotion distribution learning
in order to address the aforementioned issues.
The key idea is to learn a mapping function
from sentences to their emotion distributions
describing multiple emotions and their respective intensities. Moreover, the relations of emotions are captured based on the Plutchik's
wheel of emotions and are subsequently incorporated into the learning algorithm in order
to improve the accuracy of emotion detection.
Experimental results show that the proposed
approach can effectively deal with the emotion distribution detection problem and perform remarkably better than both the state-of-theart emotion detection method and multi-label
learning methods.
@inproceedings{deyu2016emotion,
title={Emotion distribution learning from texts},
author={Deyu, ZHOU and Zhang, Xuan and Zhou, Yin and Zhao, Quan and Geng, Xin},
booktitle={Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
pages={638--647},
year={2016}
}
Event extraction from texts aims to detect structured information such as what
has happened, to whom, where and when.
Event extraction and visualization are typically considered as two different tasks. In
this paper, we propose a novel approach
based on probabilistic modelling to jointly extract and visualize events from tweets where both tasks benefit from each other. We model each event as a joint distribution over named entities, a date, a location and event-related keywords. Moreover, both tweets and event instances are
associated with coordinates in the visualization space. The manifold assumption
that the intrinsic geometry of tweets is a
low-rank, non-linear manifold within the
high-dimensional space is incorporated into the learning framework using a regularization. Experimental results show that
the proposed approach can effectively deal
with both event extraction and visualization and performs remarkably better than
both the state-of-the-art event extraction
method and a pipeline approach for event
extraction and visualization.
@inproceedings{zhou2016jointly,
title={Jointly event extraction and visualization on twitter via probabilistic modelling},
author={Zhou, Deyu and Gao, Tianmeng and He, Yulan},
booktitle={Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={269--278},
year={2016}
}
Deyu Zhou, Haiyang Xu, Xinyu Dai, Yulan He. Unsupervised Storyline Extraction from News Articles, In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), New York, USA, July 9-15, 2016.
Storyline extraction from news streams aims to extract events under a certain news topic and reveal
how those events evolve over time. It requires algorithms capable of accurately extracting events
from news articles published in different time periods and linking these extracted events into coherent stories. The two tasks are often solved separately, which might suffer from the problem of
error propagation. Existing unified approaches often consider events as topics, ignoring their structured representations. In this paper, we propose a
non-parametric generative model to extract structured representations and evolution patterns of storylines simultaneously. In the model, each storyline is modelled as a joint distribution over some
locations, organizations, persons, keywords and a
set of topics. We further combine this model with
the Chinese restaurant process so that the number of storylines can be determined automatically
without human intervention. Moreover, per-token
Metropolis-Hastings sampler based on light latent
Dirichlet allocation is employed to reduce sampling complexity. The proposed model has been
evaluated on three news corpora and the experimental results show that it outperforms several
baseline approaches.
@inproceedings{zhou2016unsupervised,
title={Unsupervised Storyline Extraction from News Articles.},
author={Zhou, Deyu and Xu, Haiyang and Dai, Xin-Yu and He, Yulan},
booktitle={IJCAI},
pages={3014--3021},
year={2016}
}
Objective: Scientists have devoted decades of efforts to understanding the in teraction between proteins or RNA production. The information might empower
the current knowledge on drug reactions or the development of certain diseases.
Nevertheless, due to the lack of explicit structure, literature in life science, one
of the most important sources of these information, prevents computer-based
systems from accessing. Therefore, biomedical event extraction, automatical13 ly acquiring knowledge of molecular events in research articles, has attract14 ed community-wide efforts recently. Most approaches are based on statistical
models, requiring large-scale annotated corpora to precisely estimate models'
parameters. However, it is usually difficult to obtain in practice. Therefore,
employing un-annotated data based on semi-supervised learning for biomedical
event extraction is a feasible solution and attracts more interests.
Methods and Material: In this paper, a semi-supervised learning frame20 work based on hidden topics for biomedical event extraction is presented. In
this framework, sentences in the un-annotated corpus are elaborately and au22 tomatically assigned with event annotations based on their distances to these
sentences in the annotated corpus. More specifically, not only the structures of
the sentences, but also the hidden topics embedded in the sentences are used for
describing the distance. The sentences and newly assigned event annotations,
together with the annotated corpus, are employed for training.
Results: Experiments were conducted on the multi-level event extraction
corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the
proposed framework when compared to the state-of-the-art approach.
Conclusion: The results suggest that by incorporating un-annotated data,
the proposed framework indeed improves the performance of the state-of-the-art
event extraction system and the similarity between sentences might be precisely
described by hidden topics and structures of the sentences.
@article{zhou2015semi,
title={A semi-supervised learning framework for biomedical event extraction based on hidden topics},
author={Zhou, Deyu and Zhong, Dayou},
journal={Artificial intelligence in medicine},
volume={64},
number={1},
pages={51--58},
year={2015},
publisher={Elsevier}
}
Storyline detection from news articles
aims at summarizing events described under a certain news topic and revealing how
those events evolve over time. It is a difficult task because it requires first the detection of events from news articles published in different time periods and then
the construction of storylines by linking
events into coherent news stories. Moreover, each storyline has different hierarchical structures which are dependent across
epochs. Existing approaches often ignore
the dependency of hierarchical structures
in storyline generation. In this paper, we
propose an unsupervised Bayesian model,
called dynamic storyline detection model,
to extract structured representations and
evolution patterns of storylines. The proposed model is evaluated on a large scale
news corpus. Experimental results show
that our proposed model outperforms several baseline approaches.
@inproceedings{zhou2015unsupervised,
title={An unsupervised Bayesian modelling approach for storyline detection on news articles},
author={Zhou, Deyu and Xu, Haiyang and He, Yulan},
booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
pages={1943--1948},
year={2015}
}
Twitter, as a popular microblogging service, has become a
new information channel for users to receive and exchange
the most up-to-date information on current events. However,
since there is no control on how users can publish messages
on Twitter, finding newsworthy events from Twitter becomes
a difficult task like "finding a needle in a haystack".
In this paper we propose a general unsupervised framework
to explore events from tweets, which consists of a pipeline
process of filtering, extraction and categorization. To filter
out noisy tweets, the filtering step exploits a lexicon-based
approach to separate tweets that are event-related from those
that are not. Then, based on these event-related tweets, the
structured representations of events are extracted and categorized automatically using an unsupervised Bayesian model
without the use of any labelled data. Moreover, the categorized events are assigned with the event type labels without
human intervention. The proposed framework has been evaluated on over 60 millions tweets which were collected for one
month in December 2010. A precision of 70.49% is achieved
in event extraction, outperforming a competitive baseline by
nearly 6%. Events are also clustered into coherence groups
with the automatically assigned event type label.
@inproceedings{zhou2015unsupervised,
title={An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization},
author={Zhou, Deyu and Chen, Liangyu and He, Yulan},
booktitle={Twenty-Ninth AAAI Conference on Artificial Intelligence},
year={2015}
}
Motivation: In molecular biology, molecular events describe observable alterations of biomolecules, such as binding of proteins or RNA
production. These events might be responsible for drug reactions or
development of certain diseases. As such, biomedical event extraction, the process of automatically detecting description of molecular
interactions in research articles, attracted substantial research interest
recently. Event trigger identification, detecting the words describing
the event types, is a crucial and prerequisite step in the pipeline process of biomedical event extraction. Taking the event types as
classes, event trigger identification can be viewed as a classification
task. For each word in a sentence, a trained classifier predicts whether
the word corresponds to an event type and which event type based on
the context features. Therefore, a well-designed feature set with a
good level of discrimination and generalization is crucial for the performance of event trigger identification.
Results: In this article, we propose a novel framework for event trigger
identification. In particular, we learn biomedical domain knowledge
from a large text corpus built from Medline and embed it into word
features using neural language modeling. The embedded features are
then combined with the syntactic and semantic context features using
the multiple kernel learning method. The combined feature set is used
for training the event trigger classifier. Experimental results on the
golden standard corpus show that 42.5% improvement on F-score
is achieved by the proposed framework when compared with the
state-of-the-art approach, demonstrating the effectiveness of the
proposed framework.
Availability and implementation: The source code for the proposed framework is freely available and can be downloaded at http://palm.seu.edu.cn/zhoudeyu/ETI_Sourcecode.zip .
@article{zhou2014event,
title={Event trigger identification for biomedical events extraction using domain knowledge},
author={Zhou, Deyu and Zhong, Dayou and He, Yulan},
journal={Bioinformatics},
volume={30},
number={11},
pages={1587--1594},
year={2014},
publisher={Oxford University Press}
}
Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency.
Early biomedical relation extraction tasks focused on capturing binary relations, such as protein-protein interactions, which are
crucial for virtually every process in a living cell. Information about these interactions provides the foundations for new therapeutic
approaches. In recent years, more interests have been shifted to the extraction of complex relations such as biomolecular events.
While complex relations go beyond binary relations and involve more than two arguments, they might also take another relation
as an argument. In the paper, we conduct a thorough survey on the research in biomedical relation extraction. We first present a
general framework for biomedical relation extraction and then discuss the approaches proposed for binary and complex relation
extraction with focus on the latter since it is a much more difficult task compared to binary relation extraction. Finally, we discuss
challenges that we are facing with complex relation extraction and outline possible solutions and future directions.
@article{zhou2014biomedical,
title={Biomedical relation extraction: from binary to complex},
author={Zhou, Deyu and Zhong, Dayou and He, Yulan},
journal={Computational and mathematical methods in medicine},
volume={2014},
year={2014},
publisher={Hindawi}
}
With the proliferation of social media
sites, social streams have proven to contain the most up-to-date information on
current events. Therefore, it is crucial to
extract events from the social streams such
as tweets. However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy. In this paper we propose a simple and yet effective Bayesian model, called Latent Event
Model (LEM), to extract structured representation of events from social media.
LEM is fully unsupervised and does not
require annotated data for training. We
evaluate LEM on a Twitter corpus. Experimental results show that the proposed
model achieves 83% in F-measure, and
outperforms the state-of-the-art baseline
by over 7%.
@inproceedings{zhou2014simple,
title={A simple bayesian modelling approach to event extraction from twitter},
author={Zhou, Deyu and Chen, Liangyu and He, Yulan},
booktitle={Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
pages={700--705},
year={2014}
}
Objective: Biomedical events extraction concerns about extracting events describing changes on the state of bio-molecules from literature. Comparing to
the protein-protein interactions (PPIs) extraction task which often only involves
the extraction of binary relations between two proteins, biomedical events extraction is much harder since it needs to deal with complex events consisting
of embedded or hierarchical relations among proteins, events, and their textual
triggers. In this paper, we propose an information extraction system based
on the hidden vector state (HVS) model, called HVS-BioEvent, for biomedical
events extraction, and investigate its capability in extracting complex events.
Methods and Material: HVS has been previously employed for the extractions of PPIs. In HVS-BioEvent, we propose an automated way to generate
abstract annotations for HVS training and further propose novel machine learning approaches for event trigger word identification, and for biomedical events
extraction from the HVS parse results.
Results: Our proposed system achieves an F-score of 49.57% on the corpus
used in the BioNLP'09 shared task, which is only 2.38% lower than the best
performing system by UTurku in the BioNLP'09 share task. Nevertheless,
HVS-BioEvent outperforms UTurku's system on complex events extraction with 36.57% vs 30.52% being achieved for extracting regulation events, and 40.61%
vs 38.99% for negative regulation events.
Conclusions: The results suggest that the HVS model with the hierarchical hidden state structure is indeed more suitable for complex event extraction since
it could naturally model embedded structural context in sentences.
Keywords: Hidden vector state model, biomedical events extraction, abstract
annotations, semantic parsing.
@article{zhou2011biomedical,
title={Biomedical events extraction using the hidden vector state model},
author={Zhou, Deyu and He, Yulan},
journal={Artificial Intelligence in Medicine},
volume={53},
number={3},
pages={205--213},
year={2011},
publisher={Elsevier}
}
Sentiment analysis concerns about automatically identifying sentiment or opinion
expressed in a given piece of text. Most prior work either use prior lexical knowledge
defined as sentiment polarity of words or view the task as a text classification problem
and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches
do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort.
In this paper, we propose a novel framework where an initial classifier is learned by
incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using
generalized expectation criteria. Documents classified with high confidence are then used
as pseudo-labeled examples for automatical domain-specific feature acquisition. The
word-class distributions of such self-learned features are estimated from the pseudolabeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the
multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using
no labeled documents.
@article{he2011self,
title={Self-training from labeled features for sentiment analysis},
author={He, Yulan and Zhou, Deyu},
journal={Information Processing \& Management},
volume={47},
number={4},
pages={606--616},
year={2011},
publisher={Elsevier}
}
Natural language understanding (NLU) aims to map sentences to their semantic mean representations. Statistical
approaches to NLU normally require fully-annotated training data where each sentence is paired with its word-level
semantic annotations. In this paper, we propose a novel
learning framework which trains the Hidden Markov Support Vector Machines without the use of expensive fullyannotated data. In particular, our learning approach takes
as input a training set of sentences labeled with abstract
semantic annotations encoding underlying embedded structural relations and automatically induces derivation rules
that map sentences to their semantic meaning representations. The proposed approach has been tested on the DARPA
Communicator Data and achieved 93.18% in F-measure, which
outperforms the previously proposed approaches of training
the hidden vector state model or conditional random fields
from unaligned data, with a relative error reduction rate of
43.3% and 10.6% being achieved.
@inproceedings{zhou2011novel,
title={A novel framework of training hidden markov support vector machines from lightly-annotated data},
author={Zhou, Deyu and He, Yulan},
booktitle={Proceedings of the 20th ACM international conference on Information and knowledge management},
pages={2025--2028},
year={2011},
organization={ACM}
}
In this paper, we propose a learning approach to train conditional random fields from unaligned data for natural language understanding where input to
model learning are sentences paired with predicate formulae (or abstract semantic annotations) without word-level annotations. The learning approach resembles
the expectation maximization algorithm. It has two advantages, one is that only
abstract annotations are needed instead of fully word-level annotations, and the
other is that the proposed learning framework can be easily extended for training other discriminative models, such as support vector machines, from abstract
annotations. The proposed approach has been tested on the DARPA Communicator Data. Experimental results show that it outperforms the hidden vector state
(HVS) model, a modified hidden Markov model also trained on abstract annotations. Furthermore, the proposed method has been compared with two other approaches, one is the hybrid framework (HF) combining the HVS model and the
support vector hidden Markov model, and the other is discriminative training of
the HVS model (DT). The proposed approach gives a relative error reduction rate
of 18.7% and 8.3% in F-measure when compared with HF and DT respectively.
@inproceedings{zhou2011learning,
title={Learning conditional random fields from unaligned data for natural language understanding},
author={Zhou, Deyu and He, Yulan},
booktitle={European Conference on Information Retrieval},
pages={283--288},
year={2011},
organization={Springer}
}
We propose a biomedical event extraction system, HVS-BioEvent, which employs the hidden
vector state (HVS) model for semantic parsing. Biomedical events extraction needs to deal with
complex events consisting of embedded or hierarchical relations among proteins, events, and their
textual triggers. In HVS-BioEvent, we further propose novel machine learning approaches for event
trigger word identification, and for biomedical events extraction from the HVS parse results. Our
proposed system achieves an F-score of 49.57% on the corpus used in the BioNLP'09 shared task,
which is only two points lower than the best performing system by UTurku. Nevertheless, HVSBioEvent outperforms UTurku on the extraction of complex event types. The results suggest that the
HVS model with the hierarchical hidden state structure is indeed more suitable for complex event
extraction since it can naturally model embedded structural context in sentences.
@inproceedings{zhou2011semantic,
title={Semantic parsing for biomedical event extraction},
author={Zhou, Deyu and He, Yulan},
booktitle={Proceedings of the Ninth International Conference on Computational Semantics},
pages={395--399},
year={2011},
organization={Association for Computational Linguistics}
}
This paper presents a weakly-supervised
method for Chinese sentiment analysis
by incorporating lexical prior knowledge
obtained from English sentiment lexicons through machine translation. A
mechanism is introduced to incorporate the prior information about polaritybearing words obtained from existing
sentiment lexicons into latent Dirichlet
allocation (LDA) where sentiment labels
are considered as topics. Experiments
on Chinese product reviews on mobile
phones, digital cameras, MP3 players,
and monitors demonstrate the feasibility and effectiveness of the proposed approach and show that the weakly supervised LDA model performs as well
as supervised classifiers such as Naive
Bayes and Support vector Machines with
an average of 83% accuracy achieved
over a total of 5484 review documents.
Moreover, the LDA model is able to
extract highly domain-salient polarity
words from text.
@inproceedings{he2010exploring,
title={Exploring english lexicon knowledge for chinese sentiment analysis},
author={He, Yulan and Harith, Alani and Zhou, Deyu},
booktitle={CIPS-SIGHAN joint conference on Chinese language processing},
year={2010}
}
In this paper, we discuss how discriminative training can be applied to the hidden vector state (HVS) model in different task
domains. The HVS model is a discrete hidden Markov model (HMM) in which each HMM state represents the state of a push-down
automaton with a finite stack size. In previous applications, maximum-likelihood estimation (MLE) is used to derive the parameters of
the HVS model. However, MLE makes a number of assumptions and unfortunately some of these assumptions do not hold.
Discriminative training, without making such assumptions, can improve the performance of the HVS model by discriminating the
correct hypothesis from the competing hypotheses. Experiments have been conducted in two domains: the travel domain for the
semantic parsing task using the DARPA Communicator data and the Air Travel Information Services (ATIS) data and the
bioinformatics domain for the information extraction task using the GENIA corpus. The results demonstrate modest improvements of
the performance of the HVS model using discriminative training. In the travel domain, discriminative training of the HVS model gives a
relative error reduction rate of 31 percent in F-measure when compared with MLE on the DARPA Communicator data and 9 percent on
the ATIS data. In the bioinformatics domain, a relative error reduction rate of 4 percent in F-measure is achieved on the GENIA corpus.
@article{zhou2008discriminative,
title={Discriminative training of the hidden vector state model for semantic parsing},
author={Zhou, Deyu and He, Yulan},
journal={IEEE Transactions on Knowledge and Data Engineering},
volume={21},
number={1},
pages={66--77},
year={2008},
publisher={IEEE}
}
During the last decade, biomedicine has witnessed a tremendous development. Large amounts of experimental and computational
biomedical data have been generated along with new discoveries, which are accompanied by an exponential increase in the number
of biomedical publications describing these discoveries. In the meantime, there has been a great interest with scientific communities
in text mining tools to find knowledge such as protein-protein interactions, which is most relevant and useful for specific analysis tasks.
This paper provides a outline of the various information extraction methods in biomedical domain, especially for discovery of protein-
protein interactions. It surveys methodologies involved in plain texts analyzing and processing, categorizes current work in biomedical
information extraction, and provides examples of these methods. Challenges in the field are also presented and possible solutions are
discussed.
@article{zhou2008extracting,
title={Extracting interactions between proteins from the literature},
author={Zhou, Deyu and He, Yulan},
journal={Journal of biomedical informatics},
volume={41},
number={2},
pages={393--407},
year={2008},
publisher={Elsevier}
}
We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden
vector state (HVS) model and the hidden
Markov support vector machines (HMSVMs). The HVS model is an extension of
the basic discrete Markov model in which
context is encoded as a stack-oriented
state vector. The HM-SVMs combine the
advantages of the hidden Markov models
and the support vector machines. By
employing a modified K-means clustering
method, a small set of most representative
sentences can be automatically selected
from an un-annotated corpus. These
sentences together with their abstract annotations are used to train an HVS model
which could be subsequently applied on
the whole corpus to generate semantic
parsing results. The most confident
semantic parsing results are selected to
generate a fully-annotated corpus which is
used to train the HM-SVMs. The proposed
framework has been tested on the DARPA
Communicator Data. Experimental results
show that an improvement over the baseline HVS parser has been observed using
the hybrid framework. When compared
with the HM-SVMs trained from the fullyannotated corpus, the hybrid framework
gave a comparable performance with only
a small set of lightly annotated sentences.
@inproceedings{zhou2008hybrid,
title={A hybrid generative/discriminative framework to train a semantic parser from an un-annotated corpus},
author={Zhou, Deyu and He, Yulan},
booktitle={Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1},
pages={1113--1120},
year={2008},
organization={Association for Computational Linguistics}
}
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge
into PPI extraction from biomedical literature in order to
address the emerging challenges of deep natural language
understanding. It is built upon the existing work on relation
extraction using the Hidden Vector State (HVS) model. The
HVS model belongs to the category of statistical learning
methods. It can be trained directly from un-annotated data
in a constrained way whilst at the same time being able to
capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or
non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained
undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference
with statistical learning produces a new paradigm to information extraction.
@inproceedings{he2008ontology,
title={Ontology-based protein-protein interactions extraction from literature using the hidden vector state model},
author={He, Yulan and Nakata, Keiichi and Zhou, Deyu},
booktitle={2008 IEEE International Conference on Data Mining Workshops},
pages={736--743},
year={2008},
organization={IEEE}
}
The knowledge about gene clusters and protein interactions is important for biological researchers
to unveil the mechanism of life. However, large
quantity of the knowledge often hides in the literature, such as journal articles, reports, books and
so on. Many approaches focusing on extracting information from unstructured text, such as pattern
matching, shallow and deep parsing, have been proposed especially for extracting protein-protein interactions (Zhou and He, 2008).
A semantic parser based on the Hidden Vector
State (HVS) model for extracting protein-protein interactions is presented in (Zhou et al., 2008). The
HVS model is an extension of the basic discrete
Markov model in which context is encoded as a
stack-oriented state vector. Maximum Likelihood
estimation (MLE) is used to derive the parameters
of the HVS model. In this paper, we propose a discriminative approach based on parse error measure
to train the HVS model. To adjust the HVS model to
achieve minimum parse error rate, the generalized
probabilistic descent (GPD) algorithm (Kuo et al.,
2002) is used. Experiments have been conducted on
the GENIA corpus. The results demonstrate modest improvements when the discriminatively trained
HVS model outperforms its MLE trained counterpart by 2.5% in F-measure on the GENIA corpus.
@inproceedings{zhou2008extracting,
title={Extracting protein-protein interaction based on discriminative training of the hidden vector state model},
author={Zhou, Deyu and He, Yulan},
booktitle={Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing},
pages={98--99},
year={2008}
}