Читать книгу The Big R-Book - Philippe J. S. De Brouwer - Страница 17

Оглавление

♣2♣ The Scientific Method and Data

The world around us is constantly changing, making the wrong decisions can be disastrous for any company or person. At the same time it is more than ever important to innovate. Innovating is providing a diversity of ideas that just as in biological evolution might hold the best suited mutation for the changing environment.

There are many ways to come to a view on what to do next. Some of the more popular methods include instinct and prejudice, juiced up with psychological biases both in perception and decision making. Other popular methods include decision by authority (“let the boss decide”), deciding by decibels (“the loudest employee is heard”) and dogmatism (“we did this in the past” or “we have a procedure that says this”). While these methods of creating an opinion and deciding might coincidently work out, in general they are sub-optimal by design. Indeed, the best solution might not even be considered or be pre-emptively be ruled out based on flawed arguments.

Looking at scientific development throughout times as well as human history, one is compelled to conclude that the only workable construct so far is also known as the scientific method. No other methods haves brought the world so many innovations and progress, no other methods have stood up in the face of scrutiny.

scientific method

The Scientific Method

Aristotle (384–322 BCE, Greece) can be seen as the father of the scientific method, because of his rigorous logical method which was much more than natural logic. But it is fair to credit Ibn al-Haytham (aka Alhazen—965–1039, Iraq) for preparing the scientific method for collaborative use. His emphasis on collecting empirical data and reproducibility of results laid the foundation for a scientific method that is much more successful. This method allows people to check each other and confirm or reject previous results.

However, both the scientific method and the word “scientist” only came into common use in the nineteenth century and the scientific method only became the standard method in the twentieth century. Therefore, it should not come as a surprise that this became also a period of inventions and development as never seen before.

scientist

Indeed, while previous inventions such as fire, agriculture, the wheel, bronze and steel might not have followed explicitly the scientific method they created a society ready to embrace the scientific method and fuel an era of accelerated innovation and expansion. The internal combustion engine, electricity and magnetism fuelled the economic growth as never seen before.


Figure 2.1: A view on the steps in the scientific method for the data scientist and mathematical modeller, aka “quant.” In a commercial company the communication and convincing or putting the model in production bear a lot of importance.

The electronic computer brought us to the twenty first century and now a new era of growth is being prepared by big data, machine learning, nanotechnology and – maybe – quantum computing.

Indeed, with huge powers come huge responsibility. Once an invention is made, it is impossible to “un-invent” it. Once the atomic bomb exist, it cannot be forgotten, it is forever part of our knowledge. What we can do is promote peaceful applications of quantum technology, such as sensors to open doors, diodes, computers and quantum computers.

singularity

For example, as information and data technology advances, the singularity1. It is our responsibility to foresee potential dangers and do all what is in our power to avoid that these dangers become an extinction event. Many inventions had a dark side and have led tomore efficient ways of killing people, degenerating the ozone layer or polluting our ecosystem. Humanity has had many difficult times and very dark days, however, never before humanity became extinct. That would be the greatest disaster for there would be no recovery possible.

So the scientific method is important. This method has brought us into the information age and we are only scratching the surface of possibilities. It is only logical that all corporates try to stay abreast of changes and put a strong emphasis on innovation. This leads to an ever-increasing focus on data, algorithms, mathematical models such as machine learning.

Data, statistics and the scientific method are powerful tools. The company that has the best data and uses its data best is the company that will be the most adaptable to the changes and hence the one to survive. This is not biological evolution, but guided evolution. We do not have to rely on a huge number of companies with random variations, but we can use data to see trends and react to them.

The role of the data-analyst in any company cannot be overestimated. It is the reader of the book on whose shoulders rest not only to read those patterns from the data but also to convince decision makers to act in this fact-based insight.

Because the role of data and analytics is so important, it is essential to follow scientific rigour. This means in the first place following the scientific method for data analysis. An interpretation of the scientific method for data-science is in Figure 2.2 on page 10.

Till now we discussed the role of the data scientists and actions that they would take. But how does it look from the point of view of data itself?

Using that scientific method for data-science, the most important thing is probably to make sure that the one understands the data verywell. Data in itself is meaningless. For example, 930 is just a number. It could be anything: fromthe age ofAdamath inGenesis, to the price of chair or the code to unlock your bike-chain. It could be a time and 930 could mean “9:30” (assume “am” if your time-zone habits require so). Knowing that interpretation, the numbers become information, but we cannot understand this information tillwe knowwhat it means (it could be the time Iwoke up – after a long party, the time of a plane to catch, a meeting at work, etc.).We can only understand the data if we know that it is a bus schedule of the bus “843-my-route-to-work” for example. This understanding, together with the insight that this bus always runs 15 minutes late and my will to catch the bus can lead to action: to go out and wait for that bus and get on it.

data

information

insight action

This simple example shows us how the data cycle in any company or within any discipline should work. We first have a question, such as for example “to which customers can we lend money without pushing them into a debt-spiral.” Then one will collect data (from own systems or credit bureau). This data can then be used to create a model that allows us to reduce the complexity of all observations to the explaining variables only: a projection in a space of lower dimensions. That model helps us to get the insight from the data and once put in production allows us to decide on the right action for each credit application.

This institution will end up with a better credit approval process, where less loss events occur. That is the role of data-science: to drive companies to the creation of more sustainable wealth in a future where all have a place and plentifulness.

This cycle – visualized in Figure 2.2 on page 10 – brings into evidence the importance of data-science. Data science is a way to bring the scientific method into a private company, so that decisions do not have to be based on gut-feeling alone. It is the role of the data scientist to take data, transform that data into information, create understanding from that data that can lead to actionable insight. It is then up to the management of the business to decide on the actions and follow them through. The importance of being connected to the reality via contact with the business cannot be overstated. In each and every step, mathematics will serve as tools, such as screwdrivers and hammers. However, the choice about which one to use depends on a good understanding what we are working with and what we are trying to achieve.


Figure 2.2: The role of data-science in a company is to take data and turn it into actionable insight. At every step – apart from technical issues that will be discussed in this book – it is of utmost importance to understand the context and limitations of data, business, regulations and customers. For effectiveness in every step, one needs to pay attention to communication and permanent contact with all stakeholders and environment is key.

Note

1 The term “singularity” refers to the point in time where an intelligent system would be able to produce an even more intelligent system that also can create another system that is a certain percentage smarter in a time that is a certain percentage faster. This inevitably leads to exponentially increasing creating of better systems. This time series converges to one point in time, where “intelligence” of themachine would hit its absolute limits. First, record of the subject is by Stanislaw Ulam in a discussion with John Von Neuman in the 1950s and an early and convincing publication is Good (1966). It is also elaborately explored in Kurzweil (2010).

The Big R-Book

Подняться наверх