Scripting and automating tasks is a major component of why I love programming, but there are times when even the code solution is tedious and complicated – if not flat out impossible. This is the story of how I was forced to implement several different programming solutions to solve a data migration issue between two software that both lacked an import/export feature.
Just yesterday I was tasked with overseeing some data entry for one of our clients. I wanted to put one of our newer employees on the case, but I quickly realized this data entry was going to consume at least one whole week of her time. I flagged the problem to some colleagues, and concluded that there was simply too much data to manually (by hand) move information from the legacy system to the new system. However, my desires to automate the process with Python or another language were quickly put to rest.
The legacy system with the old data going from 2016 to 2019 was not categorized correctly. While some entries had phone numbers in the correct form field, others had their phone number in the description. The numbers were never formatted in a consistent manner, with some using +1, others using (###) format, some using dashes in between numbers, and others presented as 10 character strings. These inconsistencies were similar with the customer email addresses, home addresses, and work descriptions. Even worst, some of the customer names were not even in the name field. To top it all off, some of the data was in Hebrew while the rest was in English.
The real solution would be to scrape all of the data and use some smart algorithms to detect an email address or phone number, but even this had high failure rates. With some email addresses or phone numbers not even having spaces before or after them, the data migration quickly became tedious. Although scripts could itemize some of the data, there were still thousands of appointments needing manual human oversight to be entered properly into the new system.
All of these problems were even more difficult due to the fact that both the legacy system and new system lacked export and import functions. Even with some of the data automatically sorted from the old system, this required inefficient web scraping. Simply put, our client lacked database access in both the new and old systems, and due to issues unknown to us, using a more modern software to handle the client’s data was not an option. Although scraping the data and then itemizing the information properly by hand was more efficient, getting the data to the new system was a struggle of its own.
In order to insert the data into the new system, there needed to be human interaction across two different user interfaces. One page presented current data, with a button to add new data needing to be pressed. Once clicked, a data entry form appeared, which then was submitted by a button that hid the entry form and showed the current data once again. So, while this toggling is not a major issue, it was made worst by their pages updating the session on each request. The system would not even allow us to open their portal in more than one browser tab, requiring us to login to the system on each tab….Disgusting! Along with that, we could not simply load the creation form – We were forced to click the Add Record button for each data entry, and there was no getting around this due to the site tracking session requests for every single action possible. While it is secure, there are far better ways to keep users secure than preventing them from using features the system should know the user has permission to use – without being forced to click specific buttons or links.
The answer? A Python (or another language) script that can loop through each data entry, find the Add Record button on the screen, click it, enter the information into the correct fields, press Submit, and do all of this with proper element targeting to deal with sporadic page load times. Now, this is not that difficult if you use proper image searching, but still annoying considering a CSV or SQL import / export feature should be default in either business software. Basically what the code does is take a picture of the screen, and then it finds the button (based on an image of the button, added by me to the code), clicks the region on the screen it finds it, and then does the same for each data field and the submit button.
So, the next time you discover a problem that requires excessive human oversight or manual data entry, you should consider a script. Although this data migration was the most complicated in terms of required programming to solve it, this problem pops up in almost every industry around the world, and I am not alone. Don’t discount the benefits of automating a process, even if it takes you the whole day to come up with a solution… It can save you a week, or maybe even more!