Hacker Newsnew | past | comments | ask | show | jobs | submit | prsdm's commentslogin

Thanks for sharing these resources! We’ll definitely take a look.


it's much more stable now.


Does it still put you in dependency hell though, where you can't add new packages without causing tons of version conflicts?


Howdy! Erick from LangChain here. If anyone is seeing version conflicts on particular packages, please let me know!

These usually stem from overly strict constraints in the underlying sdks for the integrations, and in general we've been pretty successful asking for those constraints to be loosened. The main "problem" constraint we've seen in the past has been on httpx. Curious if you've seen others!



Thank you, this is a mix of OCR and LLM, I was thinking if there might be a library to avoid using that.

A better approach will be using Textract as it maintains the flow, such as if you have a table going across multiple pages.

Btw, tesseract is not that good in getting accurate data from tables. Use it with caution especially in financial context.

I have made an open source tool to show missing data from tesseract and easy ocr https://github.com/orasik/parsevision/


Nice I really liked it!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: