La Recherche en France - CF202545399 COFUND PhD position

J-38

Doctorat

Informatique Maths Nouvelle-Aquitaine

Informatique

Nouvelle-Aquitaine

Disciplines

Laboratoire

Institution d'accueil

Description

Title of the thesis project: Towards Fair and Explainable Lightweight Multimodal Learning Models for Effective Document Understanding

Scientific description of the research project

The proposed research project aims to develop lightweight, generalizable, and multimodal learning models for document analysis. The integration of deep learning (DL) has greatly advanced the field, allowing for the analysis of complex documents by incorporating vision, text, and layout information.

Multimodal learning has become an essential strategy for understanding various document types such as legal, medical, administrative, and historical archives. Despite these advancements, current models face limitations in terms of size, computational efficiency, generalizability, and adaptability to different domains. Additionally, addressing social biases, ensuring fairness, and providing explainability in these models remain significantly challenging.

The main objective of this research is to create multimodal, multitask learning models that are lightweight and can effectively process multimodal data while preserving fairness and transparency. The focus will be on developing innovative compression and quantization techniques to reduce model size, ensuring that the models can be deployed in environments with limited resources. The project will explore knowledge distillation methods to transfer knowledge from large, complex teacher models to smaller, efficient student models. This research introduces an innovative paradigm by enabling resource-efficient AI systems to handle complex medical, administrative, and legal data, where multimodal processing, fairness, and interpretability are critical for accuracy, ethical compliance, and real-world applicability.

Scientific challenges in this project include enhancing the generalizability of models so that they perform well across various document types without overfitting or requiring excessive retraining. For instance, medical documents often include dense, domain-specific terminologies, structured tables, and diagnostic imagery, while legal documents are characterized by long, text-heavy clauses with complex semantic structures. Adapting a single model to excel across these diverse formats requires innovative approaches to avoid performance degradation in one domain while optimizing for another. The project will tackle modality biases and improve cross-modal interactions between vision, text, and layout information, which are critical for accurate document analysis. Another key challenge is adapting models to diverse document domains, such as legal or medical documents, with minimal fine-tuning. Compression and quantization will be explored to develop lightweight models suitable for fast and adaptive inference, ensuring computational efficiency without sacrificing accuracy. This is particularly critical in real-time medical triage systems or portable legal aid tools, where rapid responses are essential, and computational resources may be constrained. Finally, ensuring fairness by addressing social biases and enhancing explainability will be pivotal, allowing users to trust the model's decisions and insights. For example, when analyzing loan applications, ensuring that the model does not unfairly disadvantage applicants based on gender or ethnicity is essential. Similarly, providing clear rationales for extracted insights from medical records or legal agreements can foster trust and compliance in highly sensitive and regulated environments.

The state-of-the-art methods chosen for this research include a comprehensive combination of model compression and quantization strategies such as pruning, weight-sharing, and mixed-precision representations. These techniques aim to maintain accuracy while significantly reducing model size. Knowledge distillation will be utilized to train smaller student models that replicate the capabilities of larger teacher models. For the multimodal learning component, the project will design unified models that process visual, textual, and layout data cohesively, using shared parameter spaces and cross- modal attention mechanisms to facilitate seamless integration of information. Techniques like adversarial training and transfer learning will be employed for domain adaptation, ensuring the model's ability to adapt to new document types. Meta-learning approaches will be incorporated to enhance few-shot and zero-shot learning, boosting generalizability with minimal data. To ensure fairness and interpretability, the project will integrate metrics and loss functions that detect and mitigate social biases during training. Explainable AI tools, such as attention visualization and layer- wise relevance propagation (LRP), will be used to make the model’s decision-making process transparent. Counterfactual fairness algorithms will also be explored to guarantee that the model provides unbiased results across different demographics.

The expected outcomes of this project include scalable and efficient models that achieve state-of- the-art results with significantly reduced size, enabling deployment in real-world, resource- constrained environments. The research aims to produce a multimodal learning model capable of generalizing across diverse document types with minimal retraining while ensuring fairness and transparency. The project will also contribute to the development of multitask learning frameworks that can handle multiple related tasks, such as machine translation, content summarization, docVQA, etc. within a single unified system. By leveraging knowledge distillation, smaller models will effectively inherit the capabilities of their larger counterparts, providing practical solutions without sacrificing performance.

This research has the potential to transform the field of document analysis by creating models that are lightweight, fair, and adaptable to various domains. Such advancements will benefit industries dealing with vast quantities of documents, including legal, healthcare, and administrative sectors, by offering AI solutions that are both cost-effective and trustworthy. Furthermore, by emphasizing fairness and transparency, the project will set new benchmarks for ethical AI practices in document understanding, promoting broader adoption and trust in AI technologies.

Description

Offre financée

Dates

Langues

Divers

Contacts