Читать книгу Official Google Cloud Certified Professional Data Engineer Study Guide - Dan Sullivan - Страница 42

Row Key Access

Оглавление

Wide-column databases usually take a different approach to querying. Rather than using indexes to allow efficient lookup of rows with needed data, wide-column databases organize data so that rows with similar row keys are close together. Queries use a row key, which is analogous to a primary key in relational databases, to retrieve data. This has two implications.

Tables in wide-column databases are designed to respond to particular queries. Although relational databases are designed according to forms of normalization that minimize the risk of data anomalies, wide-column databases are designed for low-latency reads and writes at high volumes. This can lead to duplication of data. Consider IoT sensor data stored in a wide-column database. Table 1.2 shows IoT data organized by sensor ID and timestamp (milliseconds since January 1, 1970 00:00:00 UTC). Future rows would feature the same sensor ID but different corresponding timestamps, and the row key would be determined by both.

Table 1.2 IoT data by sensor ID and timestamp

Sensor ID Timestamp Temperature Relative humidity Pressure
789 1571760690 40 35 28.2
790 1571760698 42.5 50 29.1
791 1571760676 37 61 28.6

Table 1.2 is organized to answer queries that require looking up data by sensor ID and then time. It is not well suited for looking up data by time—for example, all readings over the past hour. Rather than create an index on timestamp, wide-column databases duplicate data in a different row key order. Table 1.3, for example, is designed to answer time range queries. Note that a new table must be created with the desired schema to accomplish this—there is no index that is used to support the query pattern.

Table 1.3 IoT data by timestamp and sensor ID

Timestamp Sensor ID Temperature Relative humidity Pressure
1571760676 791 37 61 28.6
1571760690 789 40 35 28.2
1571760698 790 42.5 50 29.1
Official Google Cloud Certified Professional Data Engineer Study Guide

Подняться наверх