實作十二

Hive (4) : EXPLAIN 與 DESCRIBE

技巧一: EXPLAIN

  1. 有時候邏輯上相同的 HiveQL,在執行時間上會有所不同。或者,有時候我們只是需要瞭解進行某一個複雜的 HiveQL 查詢,需要做幾次 MapReduce。這時候就可以使用 EXPLAIN 這個關鍵字來進行查詢。語法為 EXPLAIN <HiveQL> ;
    hive> EXPLAIN SELECT * FROM baseball_stats.fact_player_stats LIMIT 10;
    OK
    ABSTRACT SYNTAX TREE:
      (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME baseball_stats fact_player_stats))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_LIMIT 10)))
             
    STAGE DEPENDENCIES:
      Stage-0 is a root stage
               
    STAGE PLANS:
      Stage: Stage-0
        Fetch Operator
          limit: 10
     
    Time taken: 0.14 seconds
    
    hive> use baseball; EXPLAIN SELECT A.PlayerID, B.teamID, B.AB, B.R, B.H, B.2B, B.3B, B.HR, B.RBI FROM Master A JOIN BATTING B ON A.playerID = B.playerID;
    OK
    ABSTRACT SYNTAX TREE:
      (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME Master) A) (TOK_TABREF (TOK_TABNAME BATTING) B) (= (. (TOK_TABLE_OR_COL A) playerID) (. (TOK_TABLE_OR_COL B) playerID)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL A) PlayerID)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) teamID)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) AB)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) R)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) H)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) 2B)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) 3B)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) HR)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL B) RBI)))))
         
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 is a root stage
       
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Alias -> Map Operator Tree:
            a 
              TableScan
                alias: a
                Reduce Output Operator
                  key expressions:
                        expr: playerid
                        type: string
                  sort order: +
                  Map-reduce partition columns:
                        expr: playerid
                        type: string
                  tag: 0
                  value expressions:
                        expr: playerid
                        type: string
            b 
              TableScan
                alias: b
                Reduce Output Operator
                  key expressions:
                        expr: playerid
                        type: string
                  sort order: +
                  Map-reduce partition columns:
                        expr: playerid
                        type: string
                  tag: 1
                  value expressions:
                        expr: teamid
                        type: string
                        expr: ab
                        type: int
                        expr: r
                        type: int
                        expr: h
                        type: int
                        expr: 2b
                        type: int
                        expr: 3b
                        type: int
                        expr: hr
                        type: int
                        expr: rbi
                        type: int
          Reduce Operator Tree:
            Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {VALUE._col1}
                1 {VALUE._col3} {VALUE._col7} {VALUE._col8} {VALUE._col9} {VALUE._col10} {VALUE._col11} {VALUE._col12} {VALUE._col13}
              handleSkewJoin: false
              outputColumnNames: _col1, _col38, _col42, _col43, _col44, _col45, _col46, _col47, _col48
              Select Operator
                expressions:
                      expr: _col1
                      type: string
                      expr: _col38
                      type: string
                      expr: _col42
                      type: int
                      expr: _col43
                      type: int
                      expr: _col44
                      type: int
                      expr: _col45
                      type: int
                      expr: _col46
                      type: int
                      expr: _col47
                      type: int
                      expr: _col48
                      type: int
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
       
      Stage: Stage-0
        Fetch Operator
          limit: -1
         
    Time taken: 2.592 seconds
    hive> 
    

技巧二: DESCRIBE

  1. 有時候如果您想要查詢資料表的 Schema 或者某個 Hive 內建函數的定義時,您可以使用 DESCRIBE 這個關鍵字。
  2. 語法一:DESCRIBE <資料表名稱>;
    ve> use default;
    OK
    Time taken: 0.015 seconds
    hive> DESCRIBE mysql_data;
    OK
    id  int 
    name    string  
    Time taken: 0.107 seconds
    hive> DESCRIBE mssql_data;
    OK
    id  int 
    name    string  
    Time taken: 0.126 seconds
    
    hive> use baseball;
    OK
    Time taken: 0.021 seconds
    hive> DESCRIBE master;
    OK
    lahmanid    int 
    playerid    string  
    managerid   int 
    hofid   string  
    birthyear   int 
    birthmonth  int 
    birthday    int 
    birthcountry    string  
    birthstate  string  
    birthcity   string  
    deathyear   int 
    deathmonth  int 
    deathday    int 
    deathcountry    string  
    deathstate  string  
    deathcity   string  
    namefirst   string  
    namelast    string  
    namenote    string  
    namegiven   string  
    namenick    string  
    weight  int 
    height  int 
    bats    string  
    throws  string  
    debut   string  
    finalgame   string  
    college string  
    lahman40id  string  
    lahman45id  string  
    retroid string  
    holtzid string  
    bbrefid string  
    Time taken: 0.122 seconds
    
  3. 語法二:DESCRIBE FUNCTION <函數名稱>;
    hive> DESCRIBE FUNCTION max;
    OK
    max(expr) - Returns the maximum value of expr
    Time taken: 0.082 seconds
    hive> DESCRIBE FUNCTION count;
    OK
    count(*) - Returns the total number of retrieved rows, including rows containing NULL values.
    count(expr) - Returns the number of rows for which the supplied expression is non-NULL.
    count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL.
    Time taken: 0.111 seconds
    

本文件最後更新於: