Multimodality in LLMs might open up new opportunities for data extraction.
At Robotic Online Intelligence (ROI), we've been testing the use of multimodal LLMs for data extraction from the images of the tables (as opposed to the 'traditional' ways), charts, and other non-text formats.
In this short video clip, we show an example of how we integrate it into the Kubro(TM) system.
It seems highly promising - can perform well when table structure is complicated, e.g. with merged cells, wrapped lines, no border lines, or rotated tables.
However: a) it still requires a lot of handling and tweaks for the different types of images, how the response from the LLM is processed, and so would need to be deployed as micro-modules; b) for the larger document sets, the costs can add up.