Graduate student Kansas State University Manhattan, Kansas, United States
Abstract: Developing bioactive peptides in food proteins for nutraceutical and pharmaceutical applications is gaining momentum. Traditional wet lab methods can be time-consuming and require significant effort. With the advent of large-scale protein and peptide datasets and advanced machine learning techniques, artificial intelligence (AI)-based methods can accelerate the discovery of peptides with specific bioactivities and supplement traditional wet lab methods. In this presentation, I will discuss our recent research on the development of advanced models for peptide discovery using both traditional machine learning and deep learning techniques. Initially, we employed AAIndex-based local descriptor for peptide representation and compared seven feature selection techniques and 14 traditional machine learning methods for quantitative structure–activity relationship (QSAR) modeling of antioxidant peptides. We successfully developed the most advanced antioxidant activity prediction models for tripeptides and dipeptides, respectively, and demonstrated their applications in sorghum proteins. To broaden the spectrum of prediction model application scenarios, we further employed pretrained protein language models (LMs) as an alternative approach for peptide embeddings. We proposed the UniDL4BioPep, a universal deep learning architecture based on LM and convolutional neural network (CNN) for binary classification in peptide bioactivity. The UniDL4BioPep performed better than existing state-of-the-art models in 9 out of 10 different bioactivity prediction tasks, with higher accuracy, Mathews correlation coefficient, and area under the curve values. The newly developed model can be self-adaptive to predict any bioactivity of peptides with any length and achieve cutting-edge performance. Additionally, we explored the usage of LM with unfixed dimension output and hybrid model (CNN-LSTM network) and built the predictor LM4ACE for multiclass classification of antihypertensive peptides with great performance.
In summary, the use of AI-based approaches has the potential to improve the efficiency and effectiveness of bioactive peptide discovery. The success of our models may inspire further collaboration between biochemists and bioinformaticians in this field.