mapreduce - Nutch on Hadoop | Input path does not exist: -


i getting error input path not exist when run command

nutch inject crawldb urls 

in nutch/logs got error in hadoop.log

2015-08-16 16:08:12,834 info  crawl.injector - injector: starting @ 2015-08-16 16:08:12 2015-08-16 16:08:12,834 info  crawl.injector - injector: crawldb: crawldb 2015-08-16 16:08:12,835 info  crawl.injector - injector: urldir: urls 2015-08-16 16:08:12,835 info  crawl.injector - injector: converting injected urls crawl db entries. 2015-08-16 16:08:13,296 warn  util.nativecodeloader - unable load native-hadoop library platform... using builtin-java classes applicable 2015-08-16 16:08:13,417 warn  snappy.loadsnappy - snappy native library not loaded 2015-08-16 16:08:13,430 error security.usergroupinformation - priviledgedactionexception as:hdravi cause:org.apache.hadoop.mapred.invalidinputexception: input path not exist: file:/home/hdravi/urls 2015-08-16 16:08:13,432 error crawl.injector - injector: org.apache.hadoop.mapred.invalidinputexception: input path not exist: file:/home/hdravi/urls     @ org.apache.hadoop.mapred.fileinputformat.liststatus(fileinputformat.java:197)     @ org.apache.hadoop.mapred.fileinputformat.getsplits(fileinputformat.java:208)     @ org.apache.hadoop.mapred.jobclient.writeoldsplits(jobclient.java:1081)     @ org.apache.hadoop.mapred.jobclient.writesplits(jobclient.java:1073)     @ org.apache.hadoop.mapred.jobclient.access$700(jobclient.java:179)     @ org.apache.hadoop.mapred.jobclient$2.run(jobclient.java:983)     @ org.apache.hadoop.mapred.jobclient$2.run(jobclient.java:936)     @ java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:415)     @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1190)     @ org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient.java:936)     @ org.apache.hadoop.mapred.jobclient.submitjob(jobclient.java:910)     @ org.apache.hadoop.mapred.jobclient.runjob(jobclient.java:1353)     @ org.apache.nutch.crawl.injector.inject(injector.java:323)     @ org.apache.nutch.crawl.injector.run(injector.java:379)     @ org.apache.hadoop.util.toolrunner.run(toolrunner.java:65)     @ org.apache.nutch.crawl.injector.main(injector.java:369) 

it how searches in local file system.

this content of hadoop's core-site.xml

<configuration> <property>   <name>hadoop.tmp.dir</name>   <value>/app/hadoop/tmp</value>   <description>a base other temporary directories.</description> </property>  <property>   <name>fs.default.name</name>   <value>hdfs://localhost:54310</value>   <description>the name of default file system.  uri   scheme , authority determine filesystem implementation.    uri's scheme determines config property (fs.scheme.impl) naming   filesystem implementation class.  uri's authority used   determine host, port, etc. filesystem.</description> </property> </configuration> 

this content hadoop's hdfs-site.xml

<configuration> <property>   <name>dfs.replication</name>   <value>1</value>   <description>default block replication.   actual number of replications can specified when file created.   default used if replication not specified in create time.   </description> </property> </configuration> 

when type hadoop fs -ls -r / , output

drwxrwxrwx   - hdravi supergroup          0 2015-08-16 16:06 /user drwxrwxrwx   - hdravi supergroup          0 2015-08-16 16:06 /user/hdravi drwxr-xr-x   - hdravi supergroup          0 2015-08-16 16:06 /user/hdravi/urls -rw-r--r--   1 hdravi supergroup        240 2015-08-16 16:06 /user/hdravi/urls/seed.txt 

am missing configuration in hadoop/nutch?

update

i following error when use complete hdfs path

2015-08-16 23:33:22,876 info  crawl.injector - injector: starting @ 2015-08-16 23:33:22 2015-08-16 23:33:22,877 info  crawl.injector - injector: crawldb: crawldb 2015-08-16 23:33:22,877 info  crawl.injector - injector: urldir: hdfs://localhost:54310/user/hdravi/user/hdravi/urls 2015-08-16 23:33:22,878 info  crawl.injector - injector: converting injected urls crawl db entries. 2015-08-16 23:33:23,317 warn  util.nativecodeloader - unable load native-hadoop library platform... using builtin-java classes applicable 2015-08-16 23:33:23,410 warn  snappy.loadsnappy - snappy native library not loaded 2015-08-16 23:33:23,762 error security.usergroupinformation - priviledgedactionexception as:hdravi cause:org.apache.hadoop.ipc.remoteexception: server ipc version 9 cannot communicate client version 4 2015-08-16 23:33:23,764 error crawl.injector - injector: org.apache.hadoop.ipc.remoteexception: server ipc version 9 cannot communicate client version 4     @ org.apache.hadoop.ipc.client.call(client.java:1107)     @ org.apache.hadoop.ipc.rpc$invoker.invoke(rpc.java:229)     @ com.sun.proxy.$proxy1.getprotocolversion(unknown source)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)     @ java.lang.reflect.method.invoke(method.java:606)     @ org.apache.hadoop.io.retry.retryinvocationhandler.invokemethod(retryinvocationhandler.java:85)     @ org.apache.hadoop.io.retry.retryinvocationhandler.invoke(retryinvocationhandler.java:62)     @ com.sun.proxy.$proxy1.getprotocolversion(unknown source)     @ org.apache.hadoop.ipc.rpc.checkversion(rpc.java:422)     @ org.apache.hadoop.hdfs.dfsclient.createnamenode(dfsclient.java:183)     @ org.apache.hadoop.hdfs.dfsclient.<init>(dfsclient.java:281)     @ org.apache.hadoop.hdfs.dfsclient.<init>(dfsclient.java:245)     @ org.apache.hadoop.hdfs.distributedfilesystem.initialize(distributedfilesystem.java:100)     @ org.apache.hadoop.fs.filesystem.createfilesystem(filesystem.java:1437)     @ org.apache.hadoop.fs.filesystem.access$200(filesystem.java:66)     @ org.apache.hadoop.fs.filesystem$cache.get(filesystem.java:1455)     @ org.apache.hadoop.fs.filesystem.get(filesystem.java:254)     @ org.apache.hadoop.fs.path.getfilesystem(path.java:187)     @ org.apache.hadoop.mapred.fileinputformat.liststatus(fileinputformat.java:176)     @ org.apache.hadoop.mapred.fileinputformat.getsplits(fileinputformat.java:208)     @ org.apache.hadoop.mapred.jobclient.writeoldsplits(jobclient.java:1081)     @ org.apache.hadoop.mapred.jobclient.writesplits(jobclient.java:1073)     @ org.apache.hadoop.mapred.jobclient.access$700(jobclient.java:179)     @ org.apache.hadoop.mapred.jobclient$2.run(jobclient.java:983)     @ org.apache.hadoop.mapred.jobclient$2.run(jobclient.java:936)     @ java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:415)     @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1190)     @ org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient.java:936)     @ org.apache.hadoop.mapred.jobclient.submitjob(jobclient.java:910)     @ org.apache.hadoop.mapred.jobclient.runjob(jobclient.java:1353)     @ org.apache.nutch.crawl.injector.inject(injector.java:323)     @ org.apache.nutch.crawl.injector.run(injector.java:379)     @ org.apache.hadoop.util.toolrunner.run(toolrunner.java:65)     @ org.apache.nutch.crawl.injector.main(injector.java:369) 

i not sure nutch, regarding hadoop try loading configuration files using configuration object before starting mapreduce job.

this solution works me:

configuration conf = new configuration();         conf.addresource(new path("path hadoop/conf/core-site.xml")); conf.addresource(new path("path hadoop/conf/hdfs-site.xml")); filesystem fs = filesystem.get(conf); 

you may give try full path of input directory

hdfs://localhost:54310/user/hdravi 

Comments

Popular posts from this blog

dns - How To Use Custom Nameserver On Free Cloudflare? -

python - Pygame screen.blit not working -

c# - Web API response xml language -