The big data project helps businesses extract value from complex sets of web data.
It captures, wrangles, analyzes huge chunks of data, and then renders them in an easy-to-grasp form.
This project was centered around gathering, parsing, and standardizing data. It included PDF and OCR (Optical Character Recognition) processing.
- Processed about 1 billion records
- Recognized data in 14 languages
- Implemented Indian fonts recognition from PDF
- Synced official voter data with India Post data
This is a big data project dealing with SOLR data standardization, cleansing, and indexing. We were determined to make things work with a robust search backend solution enframed in a user-friendly interface.
- Processed around 4 billion records
- Standardized addresses and names using IntEngine
- Included source databases: Infutor, Movers, Thrive, NCOA, Spoke
Int Framework - A powerful set of classes for rapid backend applications building.
- 100% testable architecture
- Automatic logging routines
- Request - responce paradigma as root key of framework
- Entity Framework support out of the box
- .NET 4.5 as well as .NET Core full support
- Enterprise level design patterns are used
Int Engine - Fastest and the most powerful data cleansing engine that is based on USPS and Census TIGER databases.
- Address parsing based on USPS data
- Geocoding based on Census TIGER data
- Addresses standardization
- Phones verification
- Dates standardization