left-arrowright-arrow
The Art of The Possible
AI’s Potential for Extracting Data from SDSs
BY JOHN FALK
Artificial intelligence will not replace my job; however, I wish it could manage vendor Safety Data Sheets. SDSs are critical for workplace and product safety, but processing them is tedious, time-consuming, and error prone. Artificial intelligence helped solve these challenges at my organization, and I’d like to share our experiences so that product stewards, OEHS professionals, and others responsible for SDS management can have a better understanding of this emerging technology.
WHAT WE WERE LOOKING FOR My organization wanted to improve the speed, efficiency, and accuracy of vendor SDS data extraction. We support data processing for vendor SDSs, extracting relevant product and workplace safety information into structured data for use in downstream IT systems and processes. In full disclosure, my company is an IT service provider, among other things. Could we have built our own SDS AI solution? Yes, but we chose not to because there are many vendors that have a viable solution already in place. There was no need to reinvent the wheel.
As an IT company, we pride ourselves on using the best technology and consider ourselves exceptionally critical of software solutions. That said, our core requirement was: the solution must work. While that might sound obvious, we found plenty of solutions that looked great in a demo but failed in practical applications.
Free lunches actually exist. Don’t be afraid to use a free trial period. Evaluate the solution and see if it works for you. The more you try, the better informed you will become. Ultimately, that should lead to a better solution for your organization.
WIDE VARIETY Not all SDSs are the same. These documents can have different formats with different regulatory content, different languages, and so on. AI must be able to read and decipher complicated text, numbers, and formats with a strong degree of confidence for a variety of SDS template designs and country or regional differences. If an AI solution can read one SDS format with at least 95 percent confidence, that doesn’t mean it will be equally successful with the same text in a different SDS template. These solutions can be highly temperamental. Evaluate as many different SDS formats as reasonably possible.
LANGUAGE ABILITIES Even though your favorite AI chatbot can speak any language, not all SDS AI solutions support all languages. Similar to SDS formatting, it’s important to evaluate your own needs. My organization opted out of multi-language support. After further evaluation, we realized our downstream systems operate in English, even though we can generate safety information in other languages based on geographic and plant-specific needs. Additionally, most vendor SDSs are in English, whether the document is authored for the United States, Canada, the European Union, or elsewhere. We modified our process to simplify the adoption of AI and reduce technical complexity.
CONTINUOUS IMPROVEMENT No AI model is perfect. As a result, we needed a solution provider that was committed to continuous improvement and AI model training. A working solution today will inevitably lose effectiveness over time due to changes to SDSs and the dynamic regulatory landscape. Ongoing AI training enables the model to learn from a wide variety of SDS documents, improve the accuracy of data extraction, and increase performance. AI is only as good as the volume of model data and the human training it receives. It is the cornerstone of an effective and intelligent AI system.
COSTS Beware of hidden costs. Everyone wants the best price, but running artificial intelligence applications takes significant computer horsepower, and that doesn’t come cheap. Most vendors offer a usage-based price; however, the computer hosting services can be a separate service with additional cost. Will the AI solution be hosted within your organization or on the public cloud? Some companies may have a large IT infrastructure that can easily and cost-effectively handle the additional computer processing demand. For my organization, a cloud-hosted model was cheaper and allowed for real-time AI model updates. Some companies may opt for a locally hosted solution because of data privacy concerns or company IT policy. In these cases, be aware that the AI model and software updates may be dependent on your IT organization.
Less is more. Most organizations don’t need all 16 sections of an SDS. Customization is useful when you only need a subset of data. Not only does customizing your extraction reduce the human effort required to review and confirm results, but it decreases processing runtime and associated costs. For example, you may only be interested in sections 4–8, and furthermore, only specific aspects of those sections. The more you can customize your AI, the faster and more reliable your processes become.

AI is only as good as the volume of model data and the human training it receives. It is the cornerstone of an effective and intelligent AI system.
DATA STORAGE AND MAPPING There is so much data. Data is only useful if you have a place to store it. For my organization, this meant that the application must have robust and well-documented APIs for data integration to our target system. Not only must the APIs be able to send the data, but the data must be in a format that our target system could use. For example, mapping the hazard codes (H-codes) for the E.U.’s regulation on classification, labeling, and packaging of substances and mixtures (the CLP regulation) is easy: H302 = H302 regardless of the corresponding text. Firefighting phrases can be mapped with fuzzy logic—an approach to reasoning that allows statements to have degrees of truth rather than either true or false—with substantial confidence. But toxicology data is a completely different challenge. Think about the data you need, what it looks like in your target system, and how you would integrate and map to it. Our solution isn’t perfect; however, we have a flexible framework when 100 percent confidence mapping is missing. It is possible to choose one of three options:
1. Select a statement from a pre-populated list.
2. Manually add free text.
3. Add the source statement to your target system.
Having this flexibility embedded into the AI application is particularly useful so you do not have to continually access your target system to add text or complete a mapping.
SIMPLE IS BEST The software must be easy to use. A user-friendly application can have a significant impact on end-user adoption. If it is intuitive and simple to navigate, people will want to use it. Well, at least they won’t hate it. Again, less is more. Too many features can be overwhelming, especially if they aren’t necessary for the user’s job. It’s inevitable that a human will need to review and approve each SDS, so a streamlined user interface can accelerate the manual review process. The fewer clicks required for the user to approve an SDS, map a phrase or statement, or select an alternative option, the faster your process becomes.
PROCESSING SPEED AI processing takes time. Solution speed is something to consider. Our solution takes approximately two minutes to process a standard 16-section SDS. As your SDS model and process increases in confidence and quality, you may want to increase your SDS throughput. Is the AI solution scalable? Can you add more processing power to increase speed? Alternatively, could you reduce power to save costs? We perform batch processing at night, so the SDS queue is ready for review first thing in the morning. This doesn’t mean that SDSs cannot be processed during the workday; it just means that we have reduced processing capacity and shifted the computer workload to off-hours.
CUSTOMER SUPPORT Customer support, particularly during implementation, had a significant impact on the success of our project. There are a few things we considered. Is the support team local or based on the other side of the globe? Will you have an on-site representative during testing and go-live phases of the project? Will issues be resolved in a day? A few days? A week? Do they have any customer references? Based on our experience, there is nothing AI-specific about customer service. The same principles apply to any software project.
TRUST BUT VERIFY One of my wildest dreams actually came true. No, SDS AI did not read my mind. It also didn’t automatically load every phrase, number, and value to my target system without training and mapping. That took a significant amount of time and energy.
But SDS AI was able to validate the regulatory accuracy of my SDS. It cross-referenced hazardous components from section 3 against various regulations and rules, and it highlighted potential discrepancies with classifications in section 2 or inventory lists in section 15. This has increased the confidence of our SDS data and helped us achieve a level of accuracy well beyond our expectations. We see this feature becoming increasingly useful in identifying PFAS with the evolving global, national, and local lists.
AUTHORING CHALLENGES I’ve been asked time and again if I think AI can author SDSs. My answer: yes and no. Theoretically, AI can generate an SDS. Is it compliant? Potentially. But only if the AI model is properly trained for that specific chemistry, region/country, language, and so on. This could be done by expert SDS authors training an AI model with an extremely large base of sample data. This is fundamentally how SDS data extraction works for us. Tons of training data plus human review yields AI model success! Most companies use SDS authoring software today; AI SDS generation can’t be that different, right?
I may come from an old school of thought, but classifying hazardous products and substances goes beyond generative AI’s ability to interpret and create documentation based on previous data. What happens when there is another GHS revision or country-specific update? As far as I know, you can’t just dump regulatory text into your AI solution and expect a fully classified product with a compliant SDS. You need to understand the changes, update the rules, and author an updated SDS, all of which require expert human involvement. And then you’re back at square one: training the AI model … again.
The capabilities of AI are rapidly expanding in size and scope. While I hold some skepticism today, I’m excited about the future and believe it is full of potential.
EMBRACE THE FUTURE Be steadfast yet flexible with requirements. Scope creep will happen before you know it. Being ruthless with requirements and sticking to your core competency is a great way to stay in scope. That said, you should remain open to the art of the possible. The world of AI is continuously evolving, and you should be ready to embrace the future and benefits it can afford you. Embrace the change. Enjoy the ride.
JOHN FALK is president at opesus America. He has more than 18 years of experience in consulting and implementing product stewardship and sustainability solutions for multinational chemical companies.
Disclaimer: This article is for informational purposes only based on personal experience and does not constitute professional advice.
Send feedback to The Synergist.