COBOL Migration to Relational Database
Background
Blackwoods is part of the Wesfarmers Group. With revenue in excess of $1Billion p.a. Blackwoods is Australia’s largest supplier of industrial and safety equipment. It is a B2B distributor of industrial tools, safety equipment, electrical components, mining and engineering supplies.
The company had invested almost 40 years in the development of a bespoke ERP system developed in COBOL. However, now that Blackwoods was planning to move to a modern ERP platform, it was proving nigh impossible to access the data in the proprietary data files.
Profound Information was asked to provide a solution that could automatically read thousands of files which were updated daily and convert the data into a modern, accessible relational database.
The Challenge
This was a technically difficult undertaking – seven companies had previously attempted it and none had succeeded. The COBOL system was an integral component in the high-volume distribution chain:
- Number of files: 220,000+
- Number of records: 5 Billion+
38 years of development resulted in many different coding approaches over time. Consequently there were many inconsistencies and idiosyncrasies in the data stored:
- Data was stored as pure hexadecimal codes with no separators to define where one column finished and the next one started (e.g. 13E2FA6D379304ED2…)
- Some data was read right-to-left, and some was partially read left-to-write – (for illustration, some of the data was stored in this order: 1, 2, 3, 4, 5, 6, 10, 9, 8, 7, 11, 12, 13…
- Eight different date formats were used, sometimes with multiple formats in the same records
- Multiple approaches were adopted for storing numbers – even within a single file there were multiple approaches to storing data
- Multiple record formats were mixed together in many of the data files – in one case over 300 record formats were mixed together in one data file.
- At least six different encoding methods were used to encode the data stored
- There was no documentation
The quantity of data made it infeasible to write the code to process each file individually, yet the idiosyncrasies inherent in the source data generated by the COBOL system didn’t allow for any off-the-shelf solution. An intelligent, automated bespoke solution was required.
The Solution
Profound Information was asked to take the COBOL data and convert it into a relational database on a daily basis. Watch Alan Perkins’ presentation about this project to the ADAPT Connected Cloud and DC Edge conference on the Gold Coast.
Alan Perkins drew on his expertise in databases and his understanding of COBOL to reverse engineer the data structures hidden in the hexadecimal code. Profound Information developed a software code generator which reads the COBOL copybooks from the source and then parses them to automatically generate data transformation software for each type of more than 1000 record types. Whenever a change is made to the COBOL data structures, the code generator picks up the change and generates a new data extraction program. Another program was written to act as a management layer and watchdog to oversee the execution of the automatically generated data transformation program.
Results
Now that the business has access to this rich mine of data, more and more use cases are opening up and demand has been high. The initial stated reason for undertaking the work was to prepare the data for a migration to a new ERP system, and maintaining focus on delivering this was challenging in the light of several business units who saw opportunities to leverage the data.
One such additional use case that was sanctioned was the provision of data to support a new digital platform. When the consultants who were working on the new platform requested the COBOL team to provide certain data that proved prohibitively expensive to collate, the work we did proved to be a viable alternative for sourcing the data and saved millions of dollars.
The two and a half years after delivering the project, the data is still being processed every day. A major test came when a change to the data structures occurred and the company was able to regenerate the code automatically by simply putting the new data structures into a receiving directory and apply the code generator, redeploy and the new tables were automatically created and populated in the database on the next process run.
It is impossible to put a price on the value this project has generated, but it runs into the millions of dollars.
Whenever a change is made to the COBOL data structures, the code generator picks up the change and generates a new data extraction program
Technology used
- Python with Django for the data layers
- PostgreSQL for hosting the translated data
- Azure Virtual Machine Scale Sets were used as the main infrastructure layer
- Terraform infrastructure as code to automate the generation of the Azure platform