Building a global data supply chain to improve protein design and protect biodiversity

Описание к видео Building a global data supply chain to improve protein design and protect biodiversity

Presented on May 10th 2023 by Phil Lorenz

Abstract
With more than 99.9% of biodiversity remaining unknown, the ground-truth genome and protein sequences available for deep learning models are highly unrepresentative. We therefore present a data-centric approach to protein design through a knowledge graph sourced from environmental metagenomics and metadata collection across 5 continents. Leveraging this data, we describe ZymCtrl, a conditional language model for the controllable generation of artificial enzymes, and display case studies with performance validation on specific protein classes including fluorinases and gene-editing systems.

Комментарии

Информация по комментариям в разработке