Source : “High Throughput Computing Technologies”, by Jazz Yao-Tsung Wang, September 12, 2013
Source : “High Throughput Computing Technologies”, by Jazz Yao-Tsung Wang, September 12, 2013
Source : “High Throughput Computing Technologies”, by Jazz Yao-Tsung Wang, September 12, 2013
Source : “High Throughput Computing Technologies”, by Jazz Yao-Tsung Wang, September 12, 2013
Source : “The Hadoop Stack - Then, Now and in the Future”, Hadoop World 2011
Source : “The Hadoop Stack - Then, Now and in the Future”, Hadoop World 2011
user@master ~ $ sqoop Try 'sqoop help' for usage.
user@master ~ $ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command.
Source : “Apache Sqoop – Overview”, by Arvind Prabhakar, October 06, 2011
user@master ~ $ sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect [jdbc-uri] Specify JDBC connect string -P Read password from console password --username [username] Set authentication username Import control arguments: --append Imports data in append mode --as-avrodatafile Imports data to Avro data files --as-sequencefile Imports data to SequenceFiles --as-textfile Imports data as plain text (default) --columns [col,col,col...] Columns to import from table -e,--query [statement] Import results of SQL 'statement' -m,--num-mappers [n] Use 'n' map tasks to import in parallel --table [table-name] Table to read --target-dir [dir] HDFS plain table destination --where [where clause] WHERE clause to use during import
Hive arguments: --create-hive-table Fail if the target hive table exists --hive-import Import tables into Hive --hive-table [table-name] Sets the table name to use when importing to hive HBase arguments: --column-family [family] Sets the target column family for the import --hbase-create-table If specified, create missing HBase tables --hbase-row-key [col] Specifies which input column to use as the row key --hbase-table [table] Import to [table] in HBase At minimum, you must specify --connect and --table
Source : “Apache Sqoop – Overview”, by Arvind Prabhakar, October 06, 2011
user@master ~ $ sqoop help export usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect [jdbc-uri] Specify JDBC connect string -P Read password from console --username [username] Set authentication username Export control arguments: --columns [col,col,col...] Columns to export to table --export-dir [dir] HDFS source path for the export -m,--num-mappers [n] Use 'n' map tasks to export in parallel --table [table-name] Table to populate At minimum, you must specify --connect, --export-dir, and --table
特徵 | Hive | RDBMS |
Schema | Schema on READ | Schema on WRITE |
更新(Update) | 支援 INSERT | 支援 UPDATE, INSERT, DELETE |
交易(Transaction) | 不支援 | 支援 |
索引(Indexes) | 不支援 | 支援 |
延遲(Latency) | 數分鐘 | 秒以內 |
函數(Function) | 數十個內建函數 | 上百個內建函數 |
多重表格新增 | 支援 | 不支援 |
SELECT時建立資料表 | 支援 | 在 SQL-02 不支援 |
SELECT | FROM 子句限用單一資料表 | SQL-92 標準 |
JOIN | INNER, OUTER, SEMI, MAP JOINS | SQL-92 或其他變形 |
次查詢(Subqueries) | 只能在 FROM 子句中使用 | 在任何子句 |