Publications

Here you will find all articles published by BIPOLAR in peer-reviewed journals and conference proceedings.

Publications

Kaminska O., Klonecki T., Kaczmarek-Majer K. ‘Feature selection in bipolar disorder episode classification using cost-constrained methods’, XAI workshops , AIME 2023, Portoroz, Slovenia.


ABSTRACT

An important step in the classification process of bipolar disorder episodes is feature selection process indicating the most relevant factors in patients’ behavior. The features in this task are associated with costs. Besides basic (low-cost) information about patients’ phone calls and text messages, we are studying the impact of acoustic features (high-cost) on classifying patients’ states. Unlike in previous papers, now we take the costs into account and thus we apply cost-constrained methods. The purpose of this paper is to examine whether the cost-constrained feature selection procedure is capable of improving the performance of the classification model while reducing the cost of making predictions. Moreover, we are trying to determine whether the reduced number of expensive features maintains a relatively high performance. We use a filter feature selection method that applies information theory. In the costconstrained modification, we add a cost factor parameter that controls the trade-off between feature importance and its cost. The experiments were performed on a large medical database collected from patients with bipolar disorder during their daily mobile calls. The results indicate that the cost-constrained method allows to achieve better results than traditional feature selection when the budget is limited.


K. Kaczmarek-Majer, G. Casalino, G. Castellano, D. Leite and O. Hryniewicz, "Fuzzy Linguistic Summaries for Explaining Online Semi-Supervised Learning," 2022 IEEE 11th International Conference on Intelligent Systems (IS), Warsaw, Poland, 2022, pp. 1-8, doi: 10.1109/IS57118.2022.10019636.


ABSTRACT

Intelligent systems for the medical domain often require processing data streams that evolve over time and are only partially labeled. At the same time, the need for explanations is of utmost importance not only due to various regulations, but also to increase trust among systems’ users. In this work, an online data-driven learning method with focus on the explainability of evolving models equipped with incremental semi-supervised learning algorithms is considered. The proposed method combines: (i) the Dynamic Incremental Semi-Supervised Fuzzy C-Means (DISSFCM) algorithm to incrementally classify subsets of data; with (ii) Linguistic Summarization, which provides explanations of the classification results in terms of short sentences in a natural language. The approach has been illustrated for streaming data collected from voice calls of patients affected by Bipolar Disorder. The results show the effectiveness of the proposed method in classifying instances belonging to healthy and affective states, and explaining the approximate reasoning behind the classification of new acoustic data related to patients.


Katarzyna Kaczmarek-Majer, Gabriella Casalino, Giovanna Castellano, Monika Dominiak, Olgierd Hryniewicz, Olga Kamińska, Gennaro Vessio, Natalia Díaz-Rodríguez (2022) PLENARY: Explaining black-box models in natural language through fuzzy linguistic summaries, Information Sciences, Volume 614, 2022, Pages 374-399, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2022.10.010.


ABSTRACT

We introduce an approach called PLENARY (exPlaining bLack-box modEls in Natural lAnguage thRough fuzzY linguistic summaries), which is an explainable classifier based on a data-driven predictive model. Neural learning is exploited to derive a predictive model based on two levels of labels associated with the data. Then, model explanations are derived through the popular SHapley Additive exPlanations (SHAP) tool and conveyed in a linguistic form via fuzzy linguistic summaries. The linguistic summarization allows translating the explanations of the model outputs provided by SHAP into statements expressed in natural language. PLENARY accounts for the imprecision related to model outputs by summarizing them into simple linguistic statements and for the imprecision related to the data labeling process by including additional domain knowledge in the form of middle-layer labels. PLENARY is validated on preprocessed speech signals collected from smartphones from patients with bipolar disorder and on publicly available mental health survey data. The experiments confirm that fuzzy linguistic summarization is an effective technique to support meta-analyses of the outputs of AI models. Also, PLENARY improves explainability by aggregating low-level attributes into high-level information granules, and by incorporating vague domain knowledge into a multi-task sequential and compositional multilayer perceptron. SHAP explanations translated into fuzzy linguistic summaries significantly improve understanding of the predictive modelling process and its outputs.


Kmita K., Kaczmarek-Majer, K., Hryniewicz O., (2022) Learning control limits for monitoring of multiple processes with neural network, SMPS 2022


ABSTRACT

In this work, inspired by the interpretability and usefulness of the statistical process control, we propose a novel procedure for simultaneous monitoring of multiple processes that is based on a neural network with learnable activation functions. The proposed procedure for learning control limits with neural network (CONNF) is aimed at scenarios where labeled data are available and makes use of these labels. CONNF can be particularly useful in monitoring processes when the amount of run-in data is insufficient, or the cost of obtaining such data is high. We illustrate the performance of CONNF method with a simulation study and preliminary results for real-life data collected from smartphones of patients with diagnosed bipolar disorder. These results show the potential of CONNF and indicate further research directions.


Kamińska O., Kaczmarek-Majer, K., Hryniewicz O., (2022) Impact of clustering of unlabeled data on classification: case study in bipolar disorder, Fedcsis 2022


ABSTRACT

Currently, it is possible to collect a large amount of data from sensors. At the same time, data are often only partially labeled. For example, in the context of smartphone based monitoring of mental state, there are much more data collected from smartphones than those collected from psychiatrists about the mental state. The approach presented in this paper is designed to examine if unlabeled data can improve the accuracy of classification tasks in the considered case study of classifying a patient’s state. First, unlabeled data are represented by clusters membership through Fuzzy C-means algorithm which corresponds to the uncertainty of the patient’s condition in this disease. Secondly, the classification is performed using two well-known algorithms, Random Forest and SVM. The obtained results indicate a minimal improvement in the quality of classification thanks to the use of membership in clusters. These results are promising due to both, the accuracy and interpretability.


Kmita., K., Casalino, G., Castellano, G., Hryniewicz, O., Kaczmarek-Majer, K., (2022) Confidence path regularization for handling label uncertainty in semi-supervised learning: use case in bipolar disorder monitoring, IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)


ABSTRACT

Semi-supervised learning has gained great interest because of its ability to combine unlabeled data with – potentially few – labeled observations in a training process. However, in some application contexts, one can question whether all available labels are equally valid. For example, in the context of bipolar disorder (BD) remote monitoring, a common practice is to extrapolate the psychiatrist’s assessment onto some fixed time window surrounding the visit, the so-called ground truth period. In consequence, all data from this period are labeled with the same category. Such an approach may potentially result in misguided supervision affecting the model’s performance. In this paper, we consider the problem of label uncertainty, assuming that the labels are crisp, but they may be assigned to particular observations with varying confidence. We propose a novel method called Confidence Path Regularization (CPR) that incorporates this uncertainty into the fuzzy c-means semi-supervised learning. The proposed CPR approach is a novel method for automatic, data-driven handling of label uncertainty. We achieve it by estimating the confidence factor for each labeled observation. In addition, CPR allows for the exploration of potential class-specific patterns in the adjusted confidence. The proposed method is illustrated with experiments on partially labeled data about speech characteristics collected from smartphone application for BD monitoring. In this particular applied scenario, we also use additional contextual data to improve the construction of confidence paths. It is shown that the proposed CPR approach enables to reflect the varying confidence in labels as compared with the nominal approach which assigns the majority of observations to the same class associated with relevant ground truth period


Kaczmarek-Majer, K., Kiersztyn, A. (2022) Experimental evaluation of the accuracy of an ensemble of fuzzy methods for classification of episodes in bipolar disorder, IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)


ABSTRACT

Clinical practice confirms that speech can support the diagnosis of several mental disorders. For example, reduced speech activity, changes in specific voice features, and pause-related measures were found to be sensitive markers of depressive symptoms. Considering the possibility of continuous speech data collection via a smartphone app, voice analysis has great potential for monitoring mental states. Nevertheless, there is still a need to select the most effective validation approaches for solving the task of predicting the mental state. Those validation approaches shall consider that the data collected from sensors and the response variables considered in this BD application problem are subject to various sources of uncertainty. The aim of the study is to perform an experimental evaluation of the accuracy of top-performing crisp and fuzzy methods, such as Naive Bayes Network, SOTA algorithm, Fuzzy Rule, Probabilistic Neural Network, Decision Tree, Gradient Boosted Tree, Random Forest, Tree Ensemble, and an ensemble approach that combines them. Various training and testing scenarios are considered for each of these methods, consisting of a given percentage of all observations. Additionally, the results from multiple methods are aggregated using the dominant function. Thus, the most frequent rating is taken and a metric based on fuzzy numbers is also considered for comparative purposes. The preliminary results of numerical experiments are promising. The sensitive point is the vicinity of the threshold of transition to a disease state. It should be noted that due to minor differences inherent in such cases, it seems intuitive to use fuzzy numbers to determine the patient’s assessment. Experiments confirmed also that the ranking of methods depends on the choice of the training set and evaluation metric.