Information Extraction & Retrieval
Relation extraction, question answering, and search.
Document Similarity
Measuring how alike two documents are – from lexical overlap measures like Jaccard and cosine similarity to semantic approaches like Word Mover’s Distance and embedding-based comparison.
Event Extraction
Identifying events, their triggers, and participant arguments from text – detecting not just that something happened, but who was involved, where, when, and how.
Information Extraction
Automatically extracting structured knowledge – entities, relations, and events – from unstructured text at scale, turning the flood of natural language into queryable data.
Information Retrieval
Finding relevant documents from large collections in response to a user’s information need – from classical term-matching models like BM25 to modern neural dense retrieval.
Keyword Extraction
Identifying the most important terms and phrases that characterize a document’s content – from statistical frequency methods to graph-based and embedding-based approaches.
Knowledge Graphs for NLP
Structured knowledge representations connecting entities and relations in graph form – enabling reasoning, retrieval, and grounding that complement the statistical patterns learned by language models.
Open Information Extraction
Extracting relation triples from text without predefined schemas – domain-independent knowledge harvesting that scales across the open web.
Topic Modeling
Discovering latent themes in document collections by learning probabilistic or algebraic decompositions that map documents to topic mixtures and topics to word distributions.