Читать книгу Global Tax Governance. Taxation on Digital Economy, Transfer Pricing and Litigation in Tax Matters (MAPs + ADR) Policies for Global Sustainability. Ongoing U.N. 2030 (SDG) and Addis Ababa Agendas - Jeffrey Owens - Страница 30
UNLOCKING VALUE WITH MACHINE LEARNING
ОглавлениеEntire books have been written on how to manage large datasets such that the data is clean and accessible. For the sake of brevity, we will move beyond the specifics of data storage, management, indexing, joining, governance, lineage, etc. and directly into what methods can be applied to extract value.
Big data analytics is the catch all phrase for using computer algorithms against large datasets to accomplish such tasks as finding outliers, grouping associated items, predicting outcomes or screening against defined criteria. Most recently these tasks often but not always involve machine leaning, which is a subset of artificial Intelligence. It is therefore important to understand what machine learning is and how it can be used. While machine learning comes in many forms, the classic example enables you to mine for patterns in existing data by capturing into an algorithm the correlations between inputs and outputs. The resulting algorithm can then be applied against new inputs to make predictions of output.
This differs from traditional software development as an algorithm is learned rather than a programmer creating an algorithm that is then used to predict an outcome. We contrast the two approaches below:
Illustration 1
This process is known as supervised learning, and it relies on the assumption a representative dataset with correct input and outputs is available from which learning based upon pattern mining can occur. An example of this approach in use is Chile which implemented a national mandatory clearance-based e-invoicing system. Invoice data runs through a machine learning algorithm which picks up the patterns between the invoice item descriptions and tax rates. When the algorithm is applied to new invoices, it reads each invoice line item and (risk) assesses whether the tax is correct. In this manner, prior data can be used to inform predictions against new data. This capability can be applied to text, numbers, and images.
This approach provides not only a predicted result but also a probability between 0 and 1 of how much confidence should be placed in the prediction. This enables low probability predictions to be routed to humans and screened manually. This combination of machine learning prediction and manual screening of select items can vastly enhance breadth of audit risk assessment efforts. Item scores can aggregate to invoice and taxpayer scores. Limited human resources can then be applied to the most impactful reviews as indicated by scoring analytics.
Other forms of machine learning include the ability to group or “cluster” like items. Items are grouped together based upon a selection of screening characteristics. Taking the above invoice item example, a cluster of like items could be created based upon a single characteristic, item description, or multiple characteristics for example including industry. The tax rate for items in a group could then be examined and outliers from the group norm identified for investigation. This approach can work well to identify outliers even when items are grouped based upon many characteristics.
Another machine learning approach worth mentioning is social network analysis. This approach enables measurement of the strength of association between two parties. It provides a means of detecting networks of actors working in concert to accomplish a variety of frauds. While it should be applied carefully as guilt by association is a slippery slope, when used correctly it can play a role in protecting against multiparty organized fraud like missing trader and fake invoice schemes.
Finally, recent advanced in machine learning have enabled training to be performed across multiple datasets without having to co-locate the data. The concept is known as “federated learning” and allows multiple parties to collaborate and build machine learning algorithms without having to expose their data to each other. In the future joint data contributions, from taxpayers and tax administrators alike, might be used to create algorithms. Once deemed sufficiently accurate, it might be possible for tax administrations to provide these algorithms to taxpayers either on the cloud or as a download. With federated learning and the ability to download the resulting shared algorithm, all taxpayer data can remain in the control of the taxpayer even when receiving the benefits of machine learning.
In summary, while we have only highlighted a few, there are dozens of machine learning techniques that could be applied to tax compliance and fraud detection. But as an exponential technology much of what is written here has likely been superseded with new advances by the time you read this. Suffice it to say that machine learning, in addition to traditional rules-oriented systems, will each play a significant role tax risk assessment and fraud detection moving forward. These powerful approaches are now available to tax administrations. Further, through federated learning honest taxpayers and tax administrations can now jointly contribute and benefit from machine learning algorithms without sacrificing privacy. The opportunity for cooperation has never been clearer. But what stands in the way is trust.