Skip to content

非正常形式的CSV格式日志

如下日志格式存在一条异常日志信息。

  • 原始日志
    __source__:  1.2.3.4
    __tag__:__client_ip__:  2.3.4.5
    __tag__:__receive_time__:  1562840879
    __topic__:
    content: 101.132.xx.xx|07/Aug/2019:11:10:37 +0800|www.123.com|GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1|200|6.729|14559|1.2.3.4:8001|200|6.716|-|-|Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))||
  • 需求 对content进行解析。
  • 解决方案 将content中的GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1替换"GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1", 使用parse-csv设置quote为"解析出正确的字段,并删除原始字段content。
    python
    * | extend content=replace(content,'GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1','"GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1"') | parse-csv -delim='|' 
    -quote='"' content as remote_addr,time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid | project-away content
  • 输出日志
    __source__:  1.2.3.4
    __tag__:__client_ip__:  2.3.4.5
    __tag__:__receive_time__:  1562840879
    __topic__:
    body_bytes_sent:  14559
    host:  www.123.com
    http_referer:  -
    http_user_agent:  Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))
    http_x_forwarded_for:  -
    remote_addr:  101.132.xx.xx
    request:  GET /alyun/htsw/?ad=5|8|6|11|  HTTP/1.1
    request_time:  6.729
    status:  200
    time_local:  07/Aug/2019:11:10:37 +0800
    upstream_addr:  1.2.3.4:8001
    upstream_response_time:  6.716
    upstream_status:  200