Expandmenu Shrunk


  • HiveSQL示例

    hive> select * from user_config;
    OK
    100636  Divid
    100011  tom
    111000  Lily
    Time taken: 0.951 seconds, Fetched: 3 row(s)
    hive> select * from user_config order by uid,uname limit 2;
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=
    15/08/11 20:01:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/08/11 20:01:25 WARN conf.Configuration: file:/tmp/hive-local-hadoop/hive_2015-08-11_20-01-13_743_5955681711277079721-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
    15/08/11 20:01:25 WARN conf.Configuration: file:/tmp/hive-local-hadoop/hive_2015-08-11_20-01-13_743_5955681711277079721-1/-local-10003/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
    15/08/11 20:01:26 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
    15/08/11 20:01:26 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
    Execution log at: /tmp/hadoop/hadoop_20150811200101_d09e2e8b-cd45-4bbb-8d92-c7becb887b67.log
    Job running in-process (local Hadoop)
    Hadoop job information for null: number of mappers: 0; number of reducers: 0
    2015-08-11 20:01:48,821 null map = 0%,  reduce = 0%
    2015-08-11 20:01:51,354 null map = 100%,  reduce = 0%
    2015-08-11 20:01:52,409 null map = 100%,  reduce = 100%
    Ended Job = job_local976434766_0001
    Execution completed successfully
    MapredLocal task succeeded
    OK
    100011  tom
    100636  Divid
    Time taken: 43.622 seconds, Fetched: 2 row(s)
    
    
    


  • hive安装(云RDBMS)

    
    Hive可以理解为Hadoop和HDFS之上为用户封装的一层便于用户使用的接口,该接口有丰富的样式,抱愧终端、WebUI以及JDBC/ODBC等。因此安装Hive需要在hadoop2.2之上。(需要和hadoop版本匹配)。
    
    -----调试
    cd $HIVE_HOME/bin
    ./hive -hiveconf hive.root.logger=DEBUG,console
    
    注意:MySQL使用latin1字符集,使用utf8使用会报错
    
    
    hadoop2.2
    hive0.13.1
    hbase0.96.2啊
    
    1.准备条件,需要安装hadoop集群(>=2节点)版本为hadoop2.2
    2.下载Hive安装包
    www.apache.org/dyn/closer.cgi/hive
    apache-hive-0.13.1-bin.tar.gz
    
    3.解压hive安装包设置环境变量
    apache-hive-0.13.1-bin.tar.gz
    export HIVE_HOME=/home/hadoop/apache-hive-1.2.0-bin
    export export PATH==$JAVA_HOME/bin:$JRE_HOME/bin:/home/hadoop/apache-hive-1.2.0-bin/bin:$PATH
    
    4.配置hive
    cd $HIVE_HOME/conf/
    
    cp hive-default.xml.template   hive-site.xml
    -------------------------------
    hive-site.xml主要参数:
    1.hive.metastore.warehouse.dir:该参数指定的hive数据存储目录,指定为hdfs上位置,默认为/user/hive/warehouse
       hive.metastor.local
    2.hive.exec.scratchdir:该参数指定Hive数据临时存储文件目录,默认为/tmp/hive-$(user.name)
    
    3.链接数据库配置(MySQL为例)
    3.1 hive-site.xml配置
    
    hive.metastore.warehouse.dir
    hive.exec.scratchdir                                  临时文件,默认放在部署hive的/tmp目录
    hive.exec.local.scratchdir                            临时文件,默认放在部署hive的/tmp目录
    
    
    javax.jdo.option.ConnectionURL
    javax.jdo.option.ConnectionDriverName
    javax.jdo.option.ConnectionUserName   (Mysql创建该用户)
    javax.jdo.option.ConnectionPassword    (Mysql创建该密码)
    
    datanucleus.readOnlyDatastore
    
    
    --创建目录 hdfs dfs -mkdir /hive
    hdfs URI
    ******************************************
    
     
       fs.default.name
       hdfs://hadoop1:54321
     
    ******************************************
    
    
    
      hive.metastore.warehouse.dir
      hdfs://hadoop1:54321/hive
      location of default database for the warehouse
    
    
    
      hive.exec.scratchdir
      /tmp/hive-${user.name}
      Scratch space for Hive jobs
    
    
    
    
      hive.exec.local.scratchdir
      /tmp/hive-local-${user.name}
      Local scratch space for Hive jobs
    
    
    
    
     
        javax.jdo.option.ConnectionURL
        jdbc:mysql://localhost:3306/hivedb?createDatabaseIfNotExist=true     /*hivedb为创建Mysql数据库*/
    
    jdbc:mysql://localhost:3306/hivedb?useUnicode=true&characterEncoding=UTF-8
        JDBC connect string for a JDBC metastore
      
    
    
     
        javax.jdo.option.ConnectionDriverName
        com.mysql.jdbc.Driver
        Driver class name for a JDBC metastore
      
    
    
      
        javax.jdo.option.ConnectionUserName
        hive
        Username to use against metastore database
      
    
    
     
        javax.jdo.option.ConnectionPassword
        hive
        password to use against metastore database
      
    
    3.2下载mysql驱动jar包,放在$hive_home/lib目录下
    
     cp mysql-connector-java-5.1.33-bin.jar apache-hive-1.2.0-bin/lib/
    
    4.环境变量设置
    export HIVE_HOME=/home/hadoop/apache-hive-1.2.0-bin
    export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf
    
    
    
    
    
    
    
    5.配置Hive链接MySQL元数据配置
    5.1.安装MySQL并且配置hive/hive账号,所有权限。
    在MySQL中创建hivedb数据库和hive账号,并且授权
    
    
    
    cd $HIVE_HOME/bin
    ./hive -hiveconf hive.root.logger=DEBUG,console
    
    
    
    
    
    
    
    
    
    
    
    
    


  • 关于Index查找算法的一点探讨

    Index查找算法也叫做块查找算法,比顺序查找有更快的效率,对于Linux文件系统以及数据库Btree索引结构的查找都能看到其身影,下以VB为例探讨下index查找算法

    1

     

    2

     

    可以看出Index算法的特点:

    1.树形结构,类似BTree,对于Oracle索引来讲,key必须是有序的,对于Index查找没有这个限制

    2.数据有序性,也就是block块index_table结构体集合前一个block块index_table(i)中任意值<index_table(i+1)

    简单讲就是max(index_table(i).key) < rnd(index_table(i+1).key)

    3.首先通过search key条件匹配i节点(branch节点)中循环,比较key 在那个index_table(i),类似我们常见的”Index Range Scan”

    4.循环3直到找i节点index_table(i)或者没有找到退出,对于i节点结构体构成的数组a(j)[类似leaf blocks],进行对比,以a(j).start,a(j)end为

    上下限进行顺序查找

    5.循环4直到找到a(j)=key或者没有找到退出查找,定位key=a(j),取出a(j)的位置位于第i个节点(branch block)的第(j)条目,这个就是

    key在index 搜索结构体中的绝对位置(i,j),类似Oacle的rowid

    6.假如我们在查找前已经知道key 位置Index结构体绝对位置(i,j),那么就能够立即定位到该key,从而快速的返回结果,类似Index Rowid    Access Scan

     

     

     

     

     

     

     

     




香港马会开奖记录|香港马会开奖资料|香港马会开奖现场|香港马会走势图|香港马会开奖结果直播|香港马会n730|