How Arrk’s AI Engineering Team Automated Data Harvesting, Reducing Costs by 48% and Speeding Up Time to Market - Resulting in Increased Revenue and Customer Satisfaction.
Customer
Their data harvesting and data entry operations are managed by a remote outsourced team, while a local call centre team is responsible for research and final publication. Our customer wanted to automate the data harvesting activity across several disparate data sets.
Problem Statement
The final aim was to achieve a 50% automation rate, a goal that appeared ambitious at the outset but was deemed attainable.
Costly Offshore Data Entry
Complex Data Sources
Free-Text Fields
Automation Goal
Solution Development
At the outset, the approach taken was a rule-based data-mapping activity. This method required customisations for each website to adapt to their unique terminologies, classifications, and free-text fields. Unfortunately, POC 1 didn’t yield results as expected due to the vast disparities in data presentation among different websites.
POC 2 introduced Natural Language Processing (NLP) and Named Entity Recognition (NER) techniques to extract key entities from the free-text fields. This approach demonstrated potential by pinpointing data locations, but it lacked the accuracy required to confidently detect and obtain accurate data. Efforts were made to combine NLP and NER with rule-based logic, but the complexity and cost of this approach became apparent.
In the final phase, a successful solution was created, blending rule-based processing and machine learning. The AI Engineering team precisely identified all the fields that the data entry team needed to populate, along with the associated validation rules. For each field, a decision was made to employ either rule-based processing or machine learning. The machine learning models, although time-consuming to build, provided the capability to generate data for fields that had previously necessitated human intervention.
Product-Oriented Approach
Agile Methodology
EmbArrk™ Workshop
Outcomes
- Machine Learning Algorithms: The project now boasts five machine learning algorithms, with accuracies for AI-enabled data extraction and injection ranging between 95% to 98%. Remarkably, one of these algorithms outperformed human data entry accuracy.
- Current Progress: While the project is still ongoing, it has already eliminated 41% of manual data extraction and entry. This equates to 500,000 data entries annually or 2,000 person-days of effort saved each year.
- Future Prospects: The next phase of this AI Engineering project will involve the integration of generative AI alongside machine learning. This step is expected to exceed the original 50% automation target.