Overview
Institutions need to secure philanthropic support and manage reputational risk through comprehensive donor due diligence. A due diligence process normally involves manually gathering, analyzing, and aggregating complex data from sources scattered across public, private and paid platforms, which requires high staff effort. In order to automate the due diligence process, people sometimes use Gen-AI tools, such as ChatGPT and DeepSeek, to auto-generate PDF dossiers in batch. Such generative models take in names and some keywords and populates profiles for the given entities. However, two main challenges in the implementation are:
- LLM Hallucination: Models provide fake or conflicting information about a person / organization
- Ambiguation between Named Entities: Models sometimes cannot differentiate entities with the same name, even with keywords provided.
Key Features
Chain of Trust
Ensure that profile data is tightly linked to the webpage it is extracted from. Make each fact traceable and verifiable.
Clustering of Profiles
Use agglomerative clustering with human intervention to check if there are ambiguous profiles in the mix.
Looking Forward
- Named Entity Recognition
Use NER to improve web scraping quality – discard paragraphs where named entity is not mentioned. - Confidence / Reliability Score
.gov, .edu , forbes – a list of high trusworthy sites. We can use source reliability to label each of our data points - Fuzzy Match Similar Inputs across Sources
Fuzzy matching of values across sources while retaining their verifiability

