Читать книгу Applied Modeling Techniques and Data Analysis 2 - Группа авторов - Страница 15

1.2.2. Interesting taxpayers

We must define a function f(x) which associates, to each element x in the dataset, a categorical value that shows its fraud risk degree and represents the class our first model will try to predict. Of course, a function that labels all the taxpayers in the dataset as tax evaders would be useless. Thus, a distinction needs to be drawn between serious tax evasion cases and those that are less relevant. To this purpose, we somehow follow (Basta et al. 2009) and choose to divide the taxpayers into two groups, the interesting ones and the not interesting ones, from the tax administration point of view (to a certain extent, interesting stands for “it might be interesting for the tax administration to go and check what’s going on ...”), based on two criteria: profitability (i.e. the ability to identify the most serious cases of tax evasion, independently from all other factors) and fairness (i.e. the ability to identify the most serious cases of tax evasion, with respect to the taxpayer’s turnover).

Honest taxpayers are treated as not interesting taxpayers, even though this label is used to indicate moderate tax evasion cases. We are somehow forced to use this approximation since we only have data on taxpayers who received a tax notice, and not on taxpayers for which an audit process may have been closed without qualifications, or may have not even been started.

Therefore, in order to take the profitability issue into account, we define a new variable, called the tax claim, which represents the higher assessed taxes if the tax notice stage is still open, or the higher settled taxes if the stage status is definitive. Note that the higher assessed tax could be different from the higher settled tax, because the IRA and the taxpayer, while reaching an agreement, can both reconsider their positions. The tax claim distribution grouped in classes (again, in thousands of euros) is shown in Figure 1.2.

Figure 1.2. Tax claim distribution. For a color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip

The left vertical axis is related to the tax claim distribution, grouped in the classes shown on the horizontal axis; the right vertical axis, on the contrary, sums up the monetary tax claim amount that arises from each group (in thousands of euro). Therefore, as it can easily be seen, the 331 most profitable tax notices (12% of the total) account for almost half of the tax revenue arising from our dataset.

The fairness criterion is then introduced to address the audit process, even towards smaller firms (which usually are charged smaller amounts of due income taxes), and it is useful as it allows the tax authorities to not discriminate against taxpayers on the basis of their turnover and introduces a deterrent effect which improves the overall tax compliance.

Therefore, we define another variable, called Z, which takes into account, for each taxpayer, both his turnover and revenues, and compares them to the tax claim. More formally, both of the ratios and are computed. Then, the minimum between these two ratios and 1 is taken. That is, the variable Z value, which thus ranges from 0 to 1.

Now, for both tax claim (TC) and Z, we calculate the 25th percentile (Q1), the median value (Q₂) and the 75th percentile (Q₃). We then state that a taxpayer may be considered interesting if he satisfies one of the following conditions:

The three above-mentioned rules can be represented as in Figure 1.3.

Figure 1.3. Determining interesting and not interesting taxpayers. For a color version of this figure, see www.iste.co.uk/dimotikalis/analysis2.zip

Once the population of our dataset is entirely divided into interesting and not interesting taxpayers, we can see from Table 1.1 that the interesting ones are far more profitable than the others (tax claim values are in thousands of euros). A machine learning tool able to distinguish these two kinds of taxpayers fairly well would then be very useful.

Our first model task will then be that of identifying, with a certain confidence degree, the taxpayers who are more likely to have evaded (both in absolute terms and as a percentage of revenues or turnover).

The literature on tax fraud detection, although using different methods and algorithms, is usually only concerned about this issue, i.e. in finding the best way to identify the most relevant cases of tax evasion (Bonchi et al. 1999; Wu et al. 2012; Gonzalez and J.D. Velasquez 2013; de Roux et al. 2018).

There is another crucial issue that has to be taken into account, i.e. the effective tax authorities’ ability to collect the tax debt arising from the tax notices sent to all of the unfaithful taxpayers. Table 1.1. Tax claim, interesting and not interesting taxpayers

Table 1.1. Tax claim, interesting and not interesting taxpayers

	Not interesting	Interesting
Tax claim	Num	Total tax claim	Average	Num	Total tax claim	Average
[0 - 1]	736	322	0.44	0	0	0.00
[1 - 2]	631	942	1.49	0	0	0.00
[2 - 5]	1,607	5,409	3.37	138	563	4.08
[5 - 10]	1,127	7,727	6.86	517	4,157	8.04
[10 - 20]	446	5,911	13.25	902	13,139	14.57
[20 - 50]	0	0	0.00	1,164	36,056	30.98
[50 - 100]	0	0	0.00	433	30,055	69.41
[100+]	0	0	0.00	327	101,987	311.89
Total	4,547	20,311	4.47	3,481	185,957	53.42

Applied Modeling Techniques and Data Analysis 2

Подняться наверх