Description
Manufacturing companies in the European Union are obliged to regularly analyze their recipes to find safer alternatives for hazardous substances. Unfortunately, available substance information is dispersed, heterogeneous and stored in databases of many private and public entities. In addition, the number of existing chemical substances already surpassed 85,000 with over 200 attributes describing substance characteristics, which makes it impossible for experts to collect and manually review this data. We tackle these issues by introducing a novel machine learning approach for alternative assessment. After developing a central database, we design an approach that performs nearest neighbor search in latent space obtained by deep autoencoders. Furthermore, we implement a post-hoc explanation technique, t-SNE, to visualize deep embeddings that enables to justify model outcomes. The application in a real-world project with a manufacturer shows that this approach can help process experts to identify possible replacement candidates more quickly and fosters comprehensibility through visualization.
Substitution of hazardous chemical substances using Deep Learning and t-SNE
Manufacturing companies in the European Union are obliged to regularly analyze their recipes to find safer alternatives for hazardous substances. Unfortunately, available substance information is dispersed, heterogeneous and stored in databases of many private and public entities. In addition, the number of existing chemical substances already surpassed 85,000 with over 200 attributes describing substance characteristics, which makes it impossible for experts to collect and manually review this data. We tackle these issues by introducing a novel machine learning approach for alternative assessment. After developing a central database, we design an approach that performs nearest neighbor search in latent space obtained by deep autoencoders. Furthermore, we implement a post-hoc explanation technique, t-SNE, to visualize deep embeddings that enables to justify model outcomes. The application in a real-world project with a manufacturer shows that this approach can help process experts to identify possible replacement candidates more quickly and fosters comprehensibility through visualization.