Information Extraction & Retrieval

Relation extraction, question answering, and search.

Document Similarity

Measuring how alike two documents are – from lexical overlap measures like Jaccard and cosine similarity to semantic approaches like Word Mover’s Distance and embedding-based comparison.

Event Extraction

Identifying events, their triggers, and participant arguments from text – detecting not just that something happened, but who was involved, where, when, and how.

Information Extraction

Automatically extracting structured knowledge – entities, relations, and events – from unstructured text at scale, turning the flood of natural language into queryable data.

Information Retrieval

Finding relevant documents from large collections in response to a user’s information need – from classical term-matching models like BM25 to modern neural dense retrieval.

Keyword Extraction

Identifying the most important terms and phrases that characterize a document’s content – from statistical frequency methods to graph-based and embedding-based approaches.

Knowledge Graphs for NLP

Structured knowledge representations connecting entities and relations in graph form – enabling reasoning, retrieval, and grounding that complement the statistical patterns learned by language models.

Open Information Extraction

Extracting relation triples from text without predefined schemas – domain-independent knowledge harvesting that scales across the open web.

Topic Modeling

Discovering latent themes in document collections by learning probabilistic or algebraic decompositions that map documents to topic mixtures and topics to word distributions.