The TIC Salut Social Foundation and Bioinformatics Barcelona will generate synthetic health data to aid research on artificial intelligence

The TIC Salut Social Foundation has signed a collaboration agreement with Bioinformatics Barcelona (BIB) to define and carry out joint actions to develop and use information technologies in the field of health services for people. The first project will consist of generating synthetic data that can meet the challenges of the Health/AI programme and be used in artificial intelligence research in the field of health.  

Joan Guanyabens, the director of the TIC Salut Social Foundation, highlighted that “The alliance with BIB will enable us to test the methodology to assess the quality of the synthetic data we have developed, as well as explore the behaviour of artificial intelligence algorithms in health, by incorporating data of this kind that simulate real data.”  

Dr Ana Ripoll, the president of BIB, stated that “In this collaboration, BIB research groups will apply their knowledge and experience of generating synthetic data, which will allow the limitations of real data to be transformed into opportunities to advance both research and generation of industrial solutions and teaching, thus contributing to tackling the health problems that affect us.” 

What are synthetic data? 

Synthetic data are artificial data that are generated from real data, keeping the same statistical distribution and characteristics, so that they can be analysed to draw statistical conclusions equivalent to those that would be drawn with the real data. In other words, synthetic data allow artificial intelligence models to be trained in a manner less intrusive of individuals’ privacy, because the data used in the training process do not directly refer to any identified or identifiable person.   

Synthetic data can be used in the various environments and scenarios for which they have been created safely and in compliance with the General Data Protection Regulation (GDPR) of the European Union, which only allows real health data to be used in very specific and limited research environments.  

Synthetic data is generally created with the help of various artificial intelligence techniques. They can thus be created using simulation models with data augmentation and oversampling techniques, among others. Generative Adversarial Neural Networks (GANs) and Variational Autoencoders (VAEs) are the state-of-the-art synthetic data generation methods. 

The Health/AI Programme’s challenges 

The synthetic data that will be created from the collaboration with BIB will be used within the framework of the Health/AI Programme’s challenges. The challenges seek to evaluate artificial intelligence solutions that help solve specific problems in the health care field and come in the form of a public and open call for projects. A challenge is currently underway to find solutions to support the diabetic retinopathy process that are integrated and interoperable in primary care in Catalonia’s comprehensive health system for public use. Other challenges are being prepared to find artificial intelligence solutions for dermatology and the exploitation of diagnostic tests such as chest x-rays and electrocardiograms, among others.  

Apart from the use of synthetic data by the programme, the alliance with BIB will also enable us study other use cases and how these data can be made available to other parties that may be interested in artificial intelligence research in the field of health. This field needs to have the largest possible volume of data and a diversity that represents the entire population.  

About Bioinformatics Barcelona (BIB) 

BIB is the non-profit association that brings together more than 50 public and private partners in the life sciences sector, including academic, technological and hospital centres, as well as large scientific infrastructures and companies, to promote knowledge generation in the field of bioinformatics. BIB’s partner organisations cover the activities in the entire bioinformatics value chain, from acquisition and preparation of the data to generation of new knowledge and the creation of specific solutions, including all the intermediate stages of advanced analysis, data interpretation, prototyping and validation. 

  • Synthetic data generated using artificial intelligence techniques can be used in research in compliance with data protection regulations.