Somatic mutations play a major role in tumorigenesis and cancer progression. However, accurate detection of somatic mutations by sequencing is particularly challenging because it requires filtering false-positive calls (which can be introduced by tumor-normal cross-contamination, sequence artifacts, and coverage) while keeping difficult-to-detect true-positive calls that occur with low allele frequency or in low-complexity regions. Most tools for somatic mutation detection use statistical and algorithmic approaches designed for a specific cancer type or sample and thus have limited generalizability. SomaticSeq, which was developed by Roche and published in 2015, used an ensemble approach that integrated algorithmically orthogonal methods and nearly 100 other features using machine learning. Although SomaticSeq improved accuracy for detection of somatic mutations over previously developed methods, the reliance on extracted features for identifying mutation location limits its performance in low-complexity regions and low tumor purity.
To increase the generalizability and accuracy of somatic mutation detection, Roche has developed NeuSomatic, the first tool to use a deep convolutional neural network (CNN)-based approach for detecting somatic mutations. The identification of feature representations directly from raw data using patterns seen in local regions enabled NeuSomatic to identify somatic mutations with high accuracy across multiple sequencing technologies, various degrees of sample purity, and different sequencing strategies. In a recent publication from Nature Communications, the authors showed that NeuSomatic outperformed other methods used for detecting somatic mutations (including Roche’s SomaticSeq), particularly in situations with low tumor purity and low allelic frequencies. NeuSomatic also had high accuracy across multiple datasets, sequencing strategies, and sequencing technologies, demonstrating its broad applicability for detecting somatic mutations. In addition, Roche collaborated with Microsoft to demonstrate NeuSomatic’s scalability and cost effective performance by processing over 200 samples from The Cancer Genome Atlas (TCGA) project on the Microsoft Azure platform at a cost of less than 1 USD and less than 3 hours per sample.