7. September 2008

Ein einfacher Ansatz, Dateien eines Ordners komplett in Tabellen zu laden

English title: A simple approach to load an entire directory into a table

Letzte Woche bekam ich die Frage, wie man denn am einfachsten ein ganzes Verzeichnis voller Dateien (Bilder, PDF-Dokumente oder andere) in die Datenbank laden kann. Der SQL*Loader setzt ja voraus, dass alle Inhalte in einer oder zumindest in bestimmten, bekannten Dateien enthalten sind. Im vorliegenden Fall ist die Anforderung ja eher, dass man ein Verzeichnis kennt und einfach den ganzen Inhalt in eine Tabelle mit einer BLOB-Spalte laden möchte.
Last week I was asked for an easy approach to load an entore directory containing files (PDF documents, images, etc) into a database table. SQL*Loader is not appropriate for this since all the content must be in one or multiple known files. But this case is another - the directory is known, but not the files in it.
Meist fängt man dann an, umfangreiche Java oder C-Programme zu schreiben - das muss aber nicht unbedingt sein. Hilfreich kann die PL/SQL-Unterstützung für Betriebssystem-Kommandos (OS_COMMAND) sein, welche ich vor einiger Zeit veröffentlicht habe und welche hier zum Download bereitsteht.
Most people start coding at this point: some C/C++, Java or PL/SQL logic which imports the files into the database. But this is not necessary: The PL/SQL package for operating system interaction (OS_COMMAND) which I published some time ago (and which is downloadable here) might be helpful here.
Nach Installation steht der Objekttyp FILE_TYPE und das Paket FILE_PKG zur Verfügung. Zunächst brauchen wir das Listing eines Verzeichnisses: Nichts leichter als das!
After installation we have an object type FILE_TYPE and the PL/SQL package FILE_PKG. First we need the directory listing: Nothing is easier!
SQL> select e.file_name, e.is_dir, e.file_size 
  2  from table(file_pkg.get_file_list(file_pkg.get_file('/path/to/my/dir'))) e

FILE_NAME                      I  FILE_SIZE
------------------------------ - ----------
another_directory              Y          0
DSC_5112_m.jpg                 N    5384837
DSC_5128_m.jpg                 N    7005470
DSC_5130_m.jpg                 N    6212896
DSC_5131_m.jpg                 N    7203724
DSC_5134_m.jpg                 N    7295177
DSC_5135_m.jpg                 N    7057265
DSC_5136_m.jpg                 N    7194343
:                              :          :
Mit der Methode GET_CONTENT_AS_BLOB können die Dateien ausgelesen werden; geht es also darum, eine Tabelle effizient zu füllen, könnte das so aussehen:
The method GET_CONTENT_AS_BLOB retrieves the file contents as a BLOB. To import the files into a table we can use it as follows:
drop table bilder_tab
/

create table bilder_tab (
  filename varchar2(200),
  content  blob
) nologging
/

/*
 * All at once 
 */

declare
  v_dirname varchar2(200) := '/path/to/my/dir';
begin
  insert /*+ APPEND*/ into bilder_tab (
    select e.file_name, e.get_content_as_blob() 
    from table(file_pkg.get_file_list(file_pkg.get_file(v_dirname))) e
    where e.is_dir = 'N'
  );
end;
/
sho err

commit
/
Ein anderer Ansatz (mit Commit-Intervall) könnte so aussehen:
Another approach (which COMMITs every n files) might look as follows:
set serveroutput on

declare
  v_dirname varchar2(200) := '/path/to/my/dir';
  v_counter pls_integer := 0;
  v_commit  pls_integer := 5;
begin
  for i in (
    select e.file_name, e.get_content_as_blob() as content
    from table(file_pkg.get_file_list(file_pkg.get_file(v_dirname))) e
  ) loop
    v_counter := v_counter + 1;
    insert into bilder_tab values (i.file_name, i.content);
    if mod(v_counter, v_commit) = 0 then
      commit;
      dbms_output.put_line(v_counter || 'file loaded: COMMIT!');
    end if;
  end loop;
end;
/
sho err

commit
/

Kommentare:

tnourse hat gesagt…

I installed your os_command (and it's dependent scripts) on a Windows hosted Oracle 10g instance. When I execute the basic command you show for listing out a directories' contents, I get a table with the correct number of rows, but the fields are all null. I have granted the permissions I thought were necessary (like the Java read permission on the directory) and the others recommended in your readme.txt. Any ideas?

Carsten Czarski hat gesagt…

Hi,

could you please post your SQL query here ...?

Best regards

-Carsten

tnourse hat gesagt…

select e.file_name, e.is_dir, e.file_size
from table(file_pkg.get_file_list(file_pkg.get_file('E:\Data_Pump_Dir'))) e

Carsten Czarski hat gesagt…

Hi,

hmmm ... looks good ... a few questions:

* Does the oracle user (the user under which the db processes run) have read access to the files ...?
* How did you grant the java read privileges ...?
* What do you see if you query the directory directly,
select file_pkg.get_file('E:\Data_Pump_Dir') from dual;* Can you query individual files in the directory ...?

Thanks

-Carsten

tnourse hat gesagt…

* Does the oracle user (the user under which the db processes run) have read access to the files ...?
Answer:YES, read/write

* How did you grant the java read privileges ...?
Answer:
begin
dbms_java.grant_permission('tnourse','sys:java.io.FilePermission','E:\Data_Pump_Dir','read');
end;

* What do you see if you query the directory directly,
select file_pkg.get_file('E:\Data_Pump_Dir') from dual;* Can you query individual files in the directory ...?
Answer:
(E:\Data_Pump_Dir, Data_Pump_Dir, 0, 04/16/2009 9:14:10 AM, Y, N, Y, Y)

Thanks for your help.

tnourse hat gesagt…

Carsten, thanks. I figured out the problem, I only granted read on the directory, not the individual contents... I changed the filepath to 'E:/Data_Pump_Dir/-' and now it works fine.

Tom

et62962003 hat gesagt…

Hi,

I got the same problem as tnourse described when I ran the query. I have Oracle 11g in Windows 2003 Server. When I ran the query, it returned the exact number of rows but with null fields. I tried to make the permission name like 'F:/xxx/xx1/-' or 'F:\xxx\xx1\*' but still got the problem.

Andy

Carsten Czarski hat gesagt…

Hi Andy,

would you be so kind to post your query here ...?


Best regards

-Carsten

et62962003 hat gesagt…

John,

I tried to run the following command and I got a table with the correct number of rows, but the fields are all null (blanks). Do you know what happened?


dbms_java.grant_permission( 'VS_MGR', 'SYS:java.io.FilePermission', 'f:\jobs\images\male', 'read' );



SQL> editWrote file afiedt.buf

1 select substr(e.file_name,1,30) file_name, e.is_dir, e.file_size
2* from table(file_pkg.get_file_list(file_pkg.get_file('f:\jobs\images\male'))) e
SQL> /

FILE_NAME I FILE_SIZE


-



10 rows selected.

SQL>

Carsten Czarski hat gesagt…

Hi,

what does the following query return ...?

select file_pkg.get_file('f:\jobs\images\male') from dual;

Regards

-Carsten

et62962003 hat gesagt…

SQL> select file_pkg.get_file('f:\jobs\images\male') from dual;

FILE_PKG.GET_FILE('f:\jobs\images\male')(FILE_PATH, FILE_NAME, FILE_SIZE, LAST_MODIFIED
--------------------------------------------------------------------------------
FILE_TYPE('f:\jobs\images\male', 'male', 0, '10-SEP-09', 'Y', 'N', 'Y', 'Y')

SQL>

Carsten Czarski hat gesagt…

Hi,

I think I got it: You granted read privileges to the directory, but not to the individual files. So in order to get a directory listing you need also read privileges on each of the files. Note the star after the "male" directory. If you like to grant the privilege fpor subdirs also, use the dash (-) instead.

dbms_java.grant_permission( 'VS_MGR', 'SYS:java.io.FilePermission', 'f:\jobs\images\male\*', 'read' );

Does this help ..?

-Carsten

et62962003 hat gesagt…

Carsten,

I got it work. However, I think I did the same thing last time for a path like 'f:\vrds_jobs\uploads\images\birth\*' and could never make it work. Was it because of the underscore in vrds_jobs? I have Oracle 11g in Windows 2003 server.

Thanks.
Andy

Carsten Czarski hat gesagt…

Hi Andy,

hmmm ... this might be the reason - you might also try the "old" DOS directory name (e.g. JOBS~1)

Regards

-Carsten

Beliebte Postings