INTERVIEW – MINING LARGE OMICS DATA SETS FOR DRUG DISCOVERY – GSK’S PHILIPPE SANSEAU
By NextLevel Life Sciences - October 15, 2018

Leading up to NextLevel Life Science’s 6th Annual R&D Data Intelligence Leaders Forum 2019, we are conducting interviews with selected members of our prestigious speaker panel to learn more about their thoughts on this vital issue.

*Opinions below are those only of the individual and do not reflect upon corporate strategy or positioning.

For more information regarding NextLevel Life Science’s 6th Annual R&D Data Intelligence Leaders Forum 2019 click here!

Philippe Sanseau, Senior Fellow, Head Computational Biology and Stats, Target Sciences,GSK R&D

NextLevel: How would you describe the work that you do at GSK?

PS: The foundation of the work that we do at GSK on my team is data. The data we are mostly interested in is large-scale, genomics data of multiple types and which is generated both internally and externally. We do a lot of work to access and manage that data. Most of the efforts in my team to use that data is on selecting and validating targets, especially with genetics evidence and combining with genomics information. We are using that data as well for biomarker and patient selection, and pathway and disease understanding. There are multiple applications of such large-scale, multi-dimensional data.

NextLevel: Target identification and validation is a key aspect of research into drug discovery. How are in silico approaches able to help formulate or strengthen hypotheses in the target discovery process? What are the challenges?

PS: A few years ago, we published a paper in “Nature Genetics” to show that if you have some genetics evidence linking a phenotype to a drug target  you were potentially twice more likely to be successful at developing a drug. Therefore, we believe in using the rapidly growing genetics information. The number of genome-wide association studies (GWAS), for example, for complex diseases is growing. We are seeing also the increase of resources like the “UK Biobank” which is very relevant in that context as well.

Some of the challenges around that kind of data are, for example, that you need to be very confident in the phenotypes associated with genetics information. I think that high-quality, deep phenotype information is very important. Another challenge comes when you are looking at genome-wide association studies and you find a single-nucleotide polymorphism (SNP), which is a genetic variation associated with a locus or region of the genome. It requires quite a lot of additional work to really identify the causal gene for the disease. That is not as easy as it looks. Those are some examples of the challenges you have to deal with, especially in the target identification space starting with genetics data.

NextLevel: In terms of genome-wide association studies (GWAS), some have hoped that these studies would lead to the identification of novel therapeutic modalities or allow selection of patients who would respond better to therapeutic interventions. What has been the success in this area and the main challenge to achieving this?

PS: I think that if you look at GWAS, clearly, they have been pointing to genes that have some impact on a disease. Some of the challenges are that these complex diseases are influenced by multiple genes, so those genes could have a small effect. It is not as simple as in the case of a rare disease, for example, or Mendelian diseases where you have “one gene, one disease” and you know exactly which mutation is causing the disease. The effects that you detect with GWAS are much more subtle and complex to analyze. That’s one example of a challenge.

The other challenge is that you really need to understand, and again that does require additional work, the direction of the effect. A genetic signal that you are going to find in a GWAS may, for example, reduce the expression of the gene, or it may increase the expression of the gene and the protein. If you don’t know what that genetic variation is doing, you are not going to know if you need to reduce the expression of that gene with an antagonist, or if you want to increase the expression of the gene with an agonist.

Another challenge is tractability. So, pharma industry, for example, has been very good at developing some chemicals or small molecules for certain protein classes, and also at developing some biologicals and antibodies, again, for some protein classes. The genes that you now identify with GWAS, however, may not be very tractable – i.e., they may not be easy to modulate with small molecules or biologicals. That’s why some of the new approaches to modulate genes, such as RNAi, for example, could be quite attractive.

Next Level: Emerging data initiatives to make data open access or public are on the rise. For example, you are currently the GSK Lead for Open Targets? Can you tell us a little bit about the Open Targets intiative and how that is positively impacting research into omics datasets?

PS: As I mentioned before, a lot of the genetics and genomics data comes from external sources. In 2014, GSK, the European Bioinformatics Institute (EBI) and the Wellcome Trust Institute (WSI) came together to establish Open Targets. The EBI is a leading institute in the world in bioinformatics. The WSI is a leading institute in the world in genomics and genetics analysis at scale. So, we felt that accessing and working with these leading institutes was going to increase our capabilities in genetics, genomics, data, and bioinformatics.

Open Targets is divided into two main components. One big component is an informatics effort to integrate a lot of genetics and genomics data into an informatics platform where you can associate diseases with genes and see the underlying information or evidence underpinning this association. It is a very simple platform to use and which we have now implemented at GSK. Usage has been increased year after year at GSK. A public version of the platform is available as well and again the usage has been increasing. So, we feel that the platform has been a success.

The other big component of Open Targets is experimental work. The experimental work revolves around doing genomics at scale. An example is a project where we are doing genome-wide CRISPR across hundreds of cancer cell lines to identify potential targets. We have been mining that information and actively pursuing some of those targets that are coming out of that screen. We are now running multiple screens in multiple diseases. We have efforts in cancer, immunology, and neurodegeneration.

A sign of success of Open Targets is that over the last few years, and since we started, we have seen more pharma companies joining. Biogen joined in 2015, Takeda joined in 2017, and Celgene joined recently in 2018. We are in discussion with other large pharma as well.

NextLevel: What do you enjoy most about your work?

PS: What I enjoy most is doing good science that will lead ultimately to benefit patients. That is what excites me when I come to work in the morning. It is extremely motivating to do something that hopefully at the end will translate into a medicine that is efficacious, but also safe to treat patients.

For more information about this R&D Data Intelligence Leaders Forum please click here!