This article is for a technical audience only
Last week, MediaLocate‘s software engineer and technical communication expert Nick Lambson, presented a contribution titled “Automating conversion from DITA to Flare with Python” at tcworld China 2023 in Shanghai. tcworld is an influential event attracting worldwide technical writers and experts in producing, managing, maintaining, and publishing technical communication. Nick presented to the tcworld audience how MediaLocate solved a critical technical communication automation problem submitted by one of its clients. They were asking to automate the conversion of several dozen DITA publications into MadCap Flare format, and they needed MediaLocate to develop the conversion tool within one week.
Selecting Tools to Automate the Technical Communication Project
Given the stringent constraints, MediaLocate engineers optimized the development process by choosing the right tools to start implementing the technical Communication automation tool. Here is a list of the selected software tools. Most of them are specific to the Python development and deployment environment.
- MadCap Flare. Our final converted publications must function adequately in Flare.
- Visual Studio Code is a lightweight and extensible, preferred by many over other heavyweight IDEs.
- GitHub is a collaboration and version control platform.
- Python is easy to read and write quickly, making it ideal for on-demand software development projects. Although its runtime speed is not the fastest, our aim was not runtime speed but development speed.
- lxml is a third-party alternative to the Python standard library’s XML module. Comparing the two, lxml more elegantly handles namespaces and doctype declarations and has more functionality.
- pandas is a third-party Python library for tabular data analysis and engineering.
- pathlib, a library introduced with Python 3.4, excels over the older os.path module in functionality and intuitiveness.
- tkinter is the Python standard library’s native GUI module, providing key sub-modules like filedialog and messagebox.
- re is the Python standard library’s regular expression module.
- slugify is a third-party library that conforms a string to be used as a filename or URL.
Technical Communication Conversion Process
The automated tool Nick developed asks the user to select a folder containing DITA files to convert to Flare, then handles nearly all of the remaining steps automatically. The tool, called MediaLocate Technical Communication Conversion Automation performs the below steps:
- STEP 1 – MANUAL: The user selects the folder containing DITA and DITAMAP files
- STEP 2 – AUTOMATED: Extraction of variables into Flare .flvar variables files.
- STEP 3 – AUTOMATED: Extraction of conditions into Flare .flcts condition tag-set files.
- STEP 4 – AUTOMATED: renaming DITA and DITAMAP files to human-readable filenames.
- STEP 5 – AUTOMATED: relinking cross-references to new filenames.
- STEP 6 – AUTOMATED: packaging up the content’s metadata into the id attribute.
- STEP 7 – MANUAL: The user imports the prepared DITA and DITAMAP files into a new Flare project.
- STEP 8 – AUTOMATED: copying .flvar variables files into the Flare project
- STEP 9 – AUTOMATED : copying .flcts condition tag set files into the Flare project.
- STEP 10 – AUTOMATED: unpackaging metadata into valid Flare variables, conditions, and snippets.
- STEP 11 – AUTOMATED: Combining linked TOCs into a single master TOC.
Statistics for the Technical Communication Project
Once the project was completed, MediaLocate’s automation team analyzed the effort required to design and implement the proposed solution. Here are the results:
- 9 Python files
- 36 functions
- 2 classes
- 948 lines of code
- 105 lines of code per file on average
The engineers modularized the tool’s features by splitting each into a Python file. That way, the scripts remained concise and easy to work with. The Python scripts made good use of functions and classes.
- 41 total hours spent on the project
- 31 hours of coding
- 10 hours of documentation
- 30 lines of code per hour
Based on our experience, allocating at least 25% of development time to documentation allows for effective and concise documentation. We aimed to quickly write clean code using software development best practices, and 30 lines of code per hour was the right metric for us.
Based on our experience with this and similar projects, we can confidently share some recommendations and best practices.
Set Expectations: When estimating the time required for a software project and before setting deadlines, we recommend considering the ninety-ninety rule coined by Tom Cargill at Bell Labs: the first 90% of development takes 90% of the time. The last 10% of software development takes the remaining 90% of the time! When you try to account for unexpected delays, you’ll run into Hofstadter’s law, which states that a project always takes longer than expected, even when the law is taken into account. Be sure to allocate plenty of time for testing and documentation.
Adapt Dynamically: Agile development was critical to this project’s success. We immediately began implementing the low-hanging fruit: features that can be easily implemented. After knocking out three basic features on the first day of development, we had a better idea of what to expect with the more complex requirements, like converting conrefs into snippets. We synced daily with the stakeholders, ensuring that development was moving in the right direction. We used value stream mapping to constantly re-evaluate our path to completion. With constant communication, we prioritized some features and set aside others.
Equip Yourself: Don’t reinvent the wheel; use the best open-source tools: lxml and pandas. We want to highlight that solid foundations in DITA, XML, Python, and Flare were vital to the success of this automation project.
Are you considering automating your technical communication or localization workflow? Contact MediaLocate to brainstorm automation ideas with our engineers.