Before we get started
Approach and objectives for this book
To fully understand how AI matters for data protection and privacy, it is necessary to analyse what is unique about AI technologies and their production. This book does not assume that its readers have experience with the technical side of things. It introduces any technological concepts that are necessary for discussion and does so at an abstract level. The module’s goal is not to turn data protection and privacy professionals into computer scientists, but to ensure they have the concepts needed to understand the issues at hand and the vocabulary needed for effective communication with software developers and other technical actors. 1
The book assumes that the reader is familiar with the general concepts of the GDPR. Basic concepts—such as the notions of “data subject” and “processing”—are taken for granted so that the course can focus on what changes with AI. Contrastingly, the module offers a more thorough revision of specialized topics, such as the rules on automated decision-making and regulation by design, with an emphasis on their AI dimension. Furthermore, the course will introduce learners to the interplay between the GDPR and the EU’s new regulation on AI technologies, the AI Act (Regulation (EU) 2024/1689). At least in a first moment, it will not offer an in-depth treatment of national requirements or industryspecific legal requirements, unless volunteers decide to maintain such national pages. However, the conceptual tools developed throughout the module can also be applied to the study of such legal instruments and their implications for data protection.
By the end of this book, readers will be able to:
- Identify the core technical features of artificial intelligence technologies and the various stages of their life cycle in an organization.
- Map the uses of AI systems within their organization and the actors involved in each use, with special emphasis on identifying data controllers and processors.
- Take stock of how these AI systems utilize and (potentially) generate personal data, and of the implications of that for compliance with data protection duties.
- Assess the implications of technical and organizational measures for data protection throughout the life cycle of an AI system; and
- Distinguish between the various kinds of mechanisms for evaluating AI systems (technical audits, impact assessments, certification schemes), identify when such evaluations are needed, and the techniques available to carry them out.
Structure of the book
This book is organized into four parts, each covering a particular aspect of the subject matter. Part I introduces the learners to basic concepts of AI and the issues they raise for data protection law. Part II discusses risks that take place at various stages of the life cycle of an AI-based tool, from the initial decision to make use of such a technology to the end of its operation. Part III offers an in-depth treatment of selected topics that are critical for organizations intending to use AI systems in accordance with the requirements of data protection law. Finally, Part IV looks at specific AI technologies that are particularly significant for data protection.
Within each part of the book
Each part of the module is divided into chapters. A chapter deals with a specific issue within the subject matter outlined by the part it belongs to. For example, 14 Data Protection and Large Language Models (the first chapter in Part IV) deals with the data protection issues raised by the use of large language models. A first passage through a chapter should demand at least an hour of self-study, to allow the learner to assimilate the concepts and get some familiarity with how to use the concepts in practice. Readers that have more time and interest can revisit the chapter and use it as a springboard for in-depth engagement with some or all of the topics covered therein.
Within each chapter
Each chapter contains an introduction, a few (at least three) substantial sections, and a conclusion. The introduction presents the general structure of the issue the unit covers, highlights which issues are covered by the sections, and explain why they have been given in-depth coverage. It also provides an overview of relevant topics not discussed in depth within the sections. The sections contain the bulk of the course’s contents, focusing on topics that must be mastered for a comprehensive view of the unit’s issue. Finally, a brief conclusion to each unit summarizes key points and highlights common trends between the individual sections.
Coming back to the example of 14 Data Protection and Large Language Models, its issue is data protection and large language models. The introduction briefly discusses what is unique about those models, so as to warrant a full unit. The three sections, in turn, analyse:
- the implications of the use of such models to data protection compliance;
- safeguarding measures that can be adopted during the design of those models; and
- safeguarding measures that can be adopted when the model is used in a particular context.
Finally, the conclusion highlights the main actionable points of those sections.
At the end of every chapter, the learner will find multiple-choice questions to allow for a selfassessment of whether they mastered the chapter’s key issues.
Some questions focus on the contents of a specific sevyion, while others require the articulation of concepts from the entire chapter. Each chapter also includes an answer sheet that signals the right answer for each question and offers comments on why the other alternatives are not adequate. All chapters also include prompts for discussion, which readers can use to reflect about how to apply the course materials in real-world contexts.
Finally, each chapter finishes with a list of references about the topic it covers. While some references are cited in the chapter’s text, citations have been reserved for passages where the text quotes from a specific text or discusses an argument or result published for a specific paper. The references section offers a more comprehensive listing of all the sources that guided the formulation of the unit, including those that might be useful as general reference texts. Readers should look at the listed materials if they want to dive more deeply into a particular issue, learn more about specific tools, or look for the answers to specific problems they face in practice.
The anatomy of a section
Every section of this book begins with an outline of its learning outcomes, that is, of the knowledges and skills the section will develop. After this outline, the following sub-sections present the theory behind that topic, with examples showing how the concepts emerge in practice. The exposition in each section is largely independent from the others, but references to previously covered topics will be present whenever they are needed.
The bulk of the book’s content is, therefore, placed within individual sections. However, each chapter also has an introduction that situates the topics covered by its sections, and a conclusion that articulates topics that cut across more than one section. Likewise, the introduction to a part defines the overall learning outcomes and context for its chapters, and the conclusion to a part articulates common trends and shared issues across them.
Tailoring the book to your needs
This living book was originally designed as a training module for self-study, which allows readers to pursue their own path to learning. If you follow this textbook from start to finish, you will acquire the basic concepts and tools that needed for identifying and addressing data protection issues related to AI technologies. However, not all learners have the same needs, and so this book is flexible enough to support different learning approaches.
By following a modular structure, this textbook allows learners to mix and match learning elements according to their needs. If a learner is already familiar with some topics in some of the sections in the book, they can skim through those sections and focus on whatever topics they have not mastered yet. If a learner has a particular interest in a specific topic, they can jump to the part, chapter, or session, using the course’s internal references to refresh other concepts as needed. And, if a learner wants to gain deeper knowledge in a particular topic, they can follow the module’s references as a springboard for further learning.
The module is oriented towards self-learning sessions guided by a reader’s time availability, which should demand around 20 minutes per substantive section. Still, the book’s modular structure means that it can also be used as a basis for a longer, instructor-led training. If an instructor has 30 minutes (or even an hour!) available for each section, they can dedicate the additional time to exercises and discussion between learners. However, those extensions are not essential for the learning experience, and self-study based on the materials provided in this book is a feasible means to develop the necessary knowledge and competences for dealing with the challenges of data protection in the age and AI.
Case studies
This book supports data protection professionals as they deal with the impact of AI in their practice. Given that organizations use AI technologies for a variety of tasks and in many ways, it would not be feasible to cover all (or even the most common) applications in a single textbook. Furthermore, as we shall see throughout the book, the data protection implications of AI relate closely to how AI technologies are used within an organization. So, this book focuses on providing general tools that are relevant for present and future applications, but learners will need to fill in the gaps of their specific contexts.
To add some concreteness to the examples offered towards the book, it makes extensive use of examples. Such examples are usually hypothetical, being organized around the same set of fictional organizations introduced in 1 Introduction to Artificial Intelligence and Data Protection. Using those hypothetical examples allow us to showcase how various aspects of data protection law play with one another, while allowing us to emphasize the aspects covered in each section. Reliance on examples also shows how organizations in different contexts use AI in diverse ways, which cannot be treated in the same fashion but require instead attention to the particulars of the AI systems being used and their operational context. Contributors are invited to refine the existing examples and add new ones as needed, but always keeping in mind these didactical aims.
Readers who are interested in a deeper dive into technical matters can consult the companion training module developed for ICT professionals: Enrico Glerean, Elements of Secure AI Systems.↩︎