Graduation Date


Degree Name

Doctor of Philosophy (PhD)

School Name

The University of Texas School of Biomedical Informatics at Houston

Advisory Committee

Trevor Cohen, MBChB, PhD


Objectives – Previous research has characterized urban healthcare providers' information needs, using various qualitative methods. However, little is known about the needs of rural primary care practitioners in Brazil. Communication exchanged during tele-consultations presents a unique data source for the study of these information needs. In this study, I characterize rural healthcare providers' information needs expressed electronically, using automated methods.

Methods – I applied automated methods to categorize messages obtained from the telehealth system from two regions in Brazil. A subset of these messages, annotated with top-level categories in the DeCS terminology (the regional equivalent of MeSH), was used to train text categorization models, which were then applied to a larger, unannotated data set. On account of their more granular nature, I focused on answers provided to the queries sent by rural healthcare providers. I studied these answers, as surrogates for the information needs they met. Message representations were generated using methods of distributional semantics, permitting the application of k-Nearest Neighbor classification for category assignment. The resulting category assignments were analyzed to determine differences across regions, and healthcare providers.

Results – Analysis of the assigned categories revealed differences in information needs across regions, corresponding to known differences in the distributions of diseases and tele-consultant expertise across these regions. Furthermore, information needs of rural nurses were observed to be different from those documented in qualitative studies of their urban counterparts, and the distribution of expressed information needs categories differed across types of providers (e.g. nurses vs. physicians).

Discussion – The automated analysis of large amounts of digitally-captured tele-consultation data suggests that rural healthcare providers' information needs in Brazil are different than those of their urban counterparts in developed countries. The observed disparities in information needs correspond to known differences in the distribution of illness and expertise in these regions, supporting the applicability of my methods in this context. In addition, these methods have the potential to mediate near real-time monitoring of information needs, without imposing a direct burden upon healthcare providers. Potential applications include automated delivery of needed information at the point of care, needs-based deployment of tele-consultation resources and syndromic surveillance.

Conclusion – I used automated text categorization methods to assess the information needs expressed at the point of care in rural Brazil. My findings reveal differences in information needs across regions, and across practitioner types, demonstrating the utility of these methods and data as a means to characterize information needs.


Telehealth, information needs, natural language processing, machine learning, healthcare provider