🌟 In a way, everything boils down to... a classification problem.
In our information and research automation business context, the classification of unstructured text lies at the heart of Robotic Online Intelligence (ROI)'s Kubro(TM) Information Engine.
Whether for identifying specific data points to extract into databases, capturing sales leads based on trigger/intent events, or monitoring for significant market developments and weak signals from an investment perspective, technically we look at these as one problem to solve in terms of process.
Specific applications can vary by domain, sector, geography and language, though.
Hence in our approach, a new use case requires its own 'topic model' where human expertise is needed in the beginning (with some help from an LLM) to define the key parameters and only then does the automation kick in. Extensive optionality does mean some complexity but also great flexibility for any application.
🗂️ To make it a bit more tangible - the short video below shows a few of the classification framework components in action, with a tagging system using keywords, regular expressions, or AI and LLMs - but the latter proves to be still a nice add-on, not the core method.
📘 Also for reference - here are our older posts on Short-Form Text Classification: https://lnkd.in/gS4z6ue and Lookahead and Lookbehind Zero-Length Assertions: https://lnkd.in/euEbRVr