Skip to content

Process and store data from a Logstore to another Logstore

Raw logs

Source Logstore oss_source

Raw logs generated at scattered points in time contain many fields and are large in size.

Processed logs

Destination Logstore log2log

The raw log data is processed to calculate the amount of OSS response data (response_size) per minute. The time is rounded by minute. The aggregated data is smaller in size.

SQL

Calculate the amount of OSS response data (response_size) per minute and round the time by minute.

sql

*|select bucket, bucket_location, bucket_storage_type, http_method, http_status, object, operation, (__time__ - __time__ % 60) as __time__ , sum(content_length_out) as response_size from log group by bucket, bucket_location, bucket_storage_type, http_method, http_status, object, operation, __time__

Calculation configurations

Modify a Scheduled SQL job Click Edit Configurations. In the Modify Scheduled SQL panel, configure the parameters.

image-47.png

Write Mode: If the source is a Logstore, you can select Import Data from Logstore to Logstore or Import Data from Logstore to Metricstore. In this example, Import Data from Logstore to Logstore is selected. SQL Code: Write code to calculate the amount of OSS response data (response_size) per minute and round the time by minute.

sql

*|select bucket, bucket_location, bucket_storage_type, http_method, http_status, object, operation, (__time__ - __time__ % 60) as __time__ , sum(content_length_out) as response_size from log group by bucket, bucket_location, bucket_storage_type, http_method, http_status, object, operation, __time__
Configuration itemDescription
Source Project/LogstoreThe name of the source Logstore, which is also the data source of the SQL code.
Destination RegionThe region where the target is located
Destination ProjectThe name of the project in which the destination Logstore resides.
Destination LogstoreThe name of the destination Logstore.
Write AuthorizationThe Alibaba Cloud Resource Name (ARN) of the role that has the write permissions on the destination Logstore.
Execute SQL authorizationARN for roles with SQL execution permissions

For more information, see Access data by using a custom role

Click Next to go to the Scheduling Settings step.

Scheduling configurations

image.png

Configuration itemDescription
Scheduling IntervalThe interval at which the SQL code is run. In this example, the interval is 1 minute.
Scheduling Time RangeThe time range during which the SQL code is run.
SQL Time WindowThe time window of logs that are analyzed when the SQL code is run. A value of [@m-1m,@m) specifies that only data within 1 minute is analyzed.
SQL TimeoutThe timeout period and the maximum number of retries of the SQL code. For example, the timeout period is 600 seconds and the maximum number of retries is 20.

Attention:

  1. If data is written to the source Logstore at a latency, we recommend that you set the Delay Task parameter to a greater value to ensure the integrity of the source data.
  2. If the maximum number of retries is reached or the timeout period expires, the current instance of the Scheduled SQL job fails and the next instance is scheduled.

For more information about the calculation and scheduling configurations of a Scheduled SQL job, see Process and save data from a Logstore to another Logstore

Use an SDK to create a Scheduled SQL job

For more information about how to use an SDK to create a Scheduled SQL job, see Use Simple Log Service SDK for Java to create a Scheduled SQL job

Job Management

Go to the Job Management page You can view the created Scheduled SQL jobs in the Simple Log Service console. image-51.png On the details page of a Scheduled SQL job, you can view the instances of the job. image.png The information about each instance includes the SQL query range. If an instance fails due to reasons such as insufficient permissions or invalid SQL syntax, or the number of processed rows in the Processed Data Size column is 0, you can retry the instance. If the number of processed rows is 0, the source data is delayed or no data exists.

Verify effects

Compare the amount of logs

The source Logstore processes 18,241 log entries per hour, which are 17.27MB in size.

The destination Logstore processes 5,752 log entries per hour, which are 1.62 MB in size.

Query the changes in the amount of OSS response data (response_size) in a bucket over time

sql

* | select bucket,bucket_location ,bucket_storage_type ,http_method ,response_size,DATE_FORMAT(FROM_UNIXTIME(__time__), '%Y-%m-%d %H:%i:%s') AS datetime where bucket ='bucket6877'and bucket_storage_type = 'archive' order by datetime

In the destination Logstore log2log of the Scheduled SQL job, query the data of the bucket6877 bucket whose storage class is Archive in the previous 15 minutes, sort the queried data by time, and then display the changes in the amount of OSS response data (response_size) over time in a chart.