Code: EDI-2019-17-VRT_2
Domain: Internet & Media
Summary
To develop a system that creates synthetic data, based on real data, that can be shared safely with third parties.
Proposed by
VRT NWS is the news service of the VRT, the Flemish public broadcast. VRT NWS is active in the field of television, radio and online.
Description
VRT monitors the behaviour of users on its websites. This data is used for analysis, optimisation, recommendations, churn analysis and much more. The technology for these data-driven activities is constantly evolving. Typically, a third party that develops a system during a project has access to anonymised data: the user data are replaced by hashes, but the behaviour data remain unchanged. It is theoretically possible to retrace the behavioural data back to the user. An example is the Netflix Prize, where the datasets had to be taken offline after a lawsuit.
This kind of problems can be avoided by creating synthetic data. This data has the same statistical properties as the real data, but cannot be traced back to the user, as there are no real users behind it. The synthetic data can be used by third parties to develop systems.
When moved to production, the system is retrained with real data by VRT, without the third party ever having been in contact with it.
The challenge here is to develop a system that creates synthetic data, based on real data, that can be shared safely with third parties.
Data
The challenge has the following sample datasets available for download
Expected outcomes
The synthetic data should have the same statistical properties as the real data. The successful candidate will prove this by performing relevant tests on both the synthetic and the real data:
- statistical analysis
- recommendations
- churn prediction
The tests should have similar results for both sets: same mean, same distributions, same evaluation scores for recommender and churn prediction.
The system should be provided in two ways:
- Demonstrator for evaluation
- Source code
How do we apply?
- Read the Guidelines for Applicants
- Doubts or questions? Read more about EDI on the About Us page, have a look at our FAQ section or drop us an email at opencall@edincubator.eu.