Читать книгу Agile 2 - Adrian Lander - Страница 29

Data Is Strategic

Оглавление

The Agile Manifesto said nothing about data. This oversight is really quite amazing given that data has always been considered to be a strategic asset. Also, data is the counterpart of a working product, and so it is therefore reasonable that practices pertaining to data are different from but relevant to practices pertaining to product development.

There are at least four ways in which data must be considered, apart from one's product.

 Historical data about one's customers, constituents, users, and so on, is a rich source from which insights can be derived if—and only if—the data is managed so that it can be understood, correlated, and analyzed after the fact.

 Transactional data about current operations is essential for product developers to model and understand so that they can correctly extend the products and services.

 Production-like test data is essential for product developers to be able to validate one's product. Too often product development teams are left to cobble together their own test data, when business stakeholders own the production data and are in the best position to do it. Yet the business stakeholders expect a working product.

 Protection of data about customers, constituents, or any individual or organization is potentially sensitive and needs to be safeguarded. This is often an afterthought, even though the security community has been proclaiming loudly for years that the only way to secure data is to have product developers build security in from the beginning.

These are all major gaps in today's common practices, and Agile says little about any of them.

The DevSecOps movement arose during the mid-2010s, tacking security onto the DevOps moniker, but it usually represents no more than automated scanning tools. For some reason, the industry has almost zero interest in doing what works: make product developers get certified in secure product development, even though there is plenty of training available for that.

If you think that data security or privacy has nothing to do with Agile, let us tell you how wrong you are. In the early 2000s when the Agile movement began, people who were building large-scale enterprise systems viewed it with interest but were dismayed by the lack of mention of anything about security and reliability. The Agile community took a cue from the lack of mention of security or reliability by the Agile Manifesto, and the community displayed little or no interest in security and reliability from the beginning. But anyone who builds enterprise-class systems knows how foolish that is.

Security and reliability need to be designed in. Here is an illustration. One of us has an interview practice, in which he asks a product developer to sketch out a design for a system. The candidate draws a diagram on a whiteboard. The interviewer then says, “Now, suppose it has to be very secure.” The candidate looks at their diagram, thinks for a bit, and then invariably creates an entirely new diagram.

The point is, if Agile is to inform us about how products and services should be built, it needs to emphasize that those products and services contain sensitive assets—data—and that protecting that needs to be a core value or principle. To fail to even mention that is akin to the Ten Commandments failing to mention “Thou shalt not kill.”

What about the other ways in which data should be considered? For example, the role of historical data? Too often we see microservice teams dump data into a data lake without documenting the structure of the data. A machine learning colleague of one of us complained that he was hired by a Fortune 10 company to create a system that could be trained from their data lake's data, but he was not able to match up data from different sources in the data lake: there was no schema that bridged the various sources—no “map” or “translation” linking the various data domains. Worse, many teams use NoSQL databases and fail to document the data structures that they store. The data lake was essentially a wasteland of unintelligible data.

As a result, this company has a huge asset—its customer, product, and sales data—and is unable to make any use of the data.

What about transactional data? One of the software methodologies that is often cited as an Agile method is Feature-Driven Development (FDD). In this method, the first step is to create a model of the organization's information. That model is then used by product teams as a shared understanding of the transactional data. Other methodologies that claim to be Agile do not mention data; for example, Scrum says nothing about it and Extreme Programming mentions it in passing. How can it be that one methodology (FDD) is built around something that the others do not even mention?

What happens with Scrum teams is that when they have to use existing transactional systems, the team members take on the task of learning about those systems. To do that, they need to request help from other teams that are familiar with those systems. Thus, the process of sharing knowledge about the organization's information is entirely ad hoc. Is that a good thing? For something as fundamental as the data that is being used by all of the organization's systems, does it not make sense to establish that there is a universal shared view of that information? Be your own judge.

Finally, the issue of test data is a crisis-level one. One of us was in a program-level meeting and heard one development lead say to another, “My tests always pass because I create data that I know will pass,” and they both laughed. But as absurd as it sounds, it is actually often the case. Product developers create test cases and test data, but they do so based on their understanding of things—an unvalidated understanding. Those tests are therefore not valid with respect to the actual structure of the organization's data. The only way to verify if things will work with product data is to test with production-like data.

Yet what usually happens is that development teams are told to create a new set of features and obtain their own test data. How? The production data is usually owned by a business function; so the developers go and talk to the people who support the systems, who have access to the production data, and say, “Can you get us some production data?” That is a very ad hoc approach and is a recipe for problems when the new features go live.

Some organizations have test accounts set up so that development teams can access production systems using test accounts and thereby access production-like data. Organizations also sometimes “mask” or “surrogate” production data to hide sensitive data from development teams. Those are effective practices. The bottom line is that data for testing is a critical need: if development teams do not have ready access, it is the responsibility of the business function that owns the data to help the development teams solve that problem and obtain production-like data.

Agile 2

Подняться наверх