|
|
![]() |
| Thread Tools | Search this Thread |
#1
| |||
| |||
Perl Script to help feed DBMS MySQL I started in Perl and I am reading the doc to learn things. My problem is simple, I have multiple files that I just need to manipulate, format to feed my MySQL database. I use MySQL 5.1, Toad Perl v4.1 & v5.10 This is one example among many other CSV file: Code: 245362;77;77289;INR69907;Felt;799;20091102232406;;VEL;799;66;24; 243640;77;66017;INT81553;"Notebook";699;20091102232318;;VEL;699;66;24; 236958;77;08;DRT03286;bed;299;20091102231945;;VEL;299;66;15; 319771;77;79060;INA38898;Pen;399;20091102231902;;VEL;399;66;3; 319771;77;79060;BTQ38898;Pen;399;20091102231902;;VEL;399;66;24; 319250;77;13110;INJ40768;Mattresses;599;20091102231736;;VEL;599;66;24; 320343;77;21758;PRP31628;Mice;299;20091102231459;;VEL;299;66;24; 331379;77;27209;INT59377;Kit;299;20091102231319;;VEL;299;66;15; 314147;77;89301;BTQ12922;Scissors;0;20091102231248;;VEL;0;66;24; 334467;77;26234;TEL38873;Handkerchief;499;20091102231131;;VEL;499;66;24; 241749;77;37732;TEL90191;Eraser;699;20091102231104;;VEL;699;66;24; Code: insert into vod_billing values("236958","8","DRT03286","bed","299","20091102231945","15","DRT" ); insert into vod_billing values("241749","37732","TEL90191","Eraser","699","20091102231104","24","TEL" ); insert into vod_billing values("243640","66017","INT81553","Notebook","699","20091102232318","24","INT" ); insert into vod_billing values("245362","77289","INR69907","Felt","799","20091102232406","24","INT" ); insert into vod_billing values("314147","89301","BTQ12922","Scissors","0","20091102231248","24","BTQ" ); insert into vod_billing values("319250","13110","INJ40768","Mattresses","599","20091102231736","24","INT" ); insert into vod_billing values("319771","79060","BTQ38898","Pen","399","20091102231902","24","BTQ" ); insert into vod_billing values("319771","79060","INA38898","Pen","399","20091102231902","3","INT" ); insert into vod_billing values("320343","21758","PRP31628","Mice","299","20091102231459","24","PRP" ); insert into vod_billing values("331379","27209","INT59377","Kit","299","20091102231319","15","INT" ); insert into vod_billing values("334467","26234","TEL38873","Handkerchief","499","20091102231131","24","TEL" ); 1.concatenate all *.csv* a directory into a single file 2.replace all characters "by anything (double quote) 3.Remove all duplicate lines 4.Sortby the first column 5. Remove all empty lines 6. The character is the delimiter column then delete column 2,8,9,10 & 11 7. To add last column in the first 3 characters of the column 3 (TEL, PRI etc ...) 8. On this last column, replace all INA INJ & Rs per INT 9. Insert before the line vod_billing insert into values ( " 10. To include end of line "); 11. To replace the delimiter, for "" 12. Rename the output file .sql The ultimate goal would be to directly feed my MySQL database created with lines, but already such a script would be perfect. I know it's a bit abused but probably simpler for a developer! |
#2
| |||
| |||
Re: Perl Script to help feed DBMS MySQL There's a difference between helping someone who starts, and her spinning is all scripts. For referrals, I suggest you to go and read a little help on the functions open(), split(), close(). A quick foreach () will also help you. |
#3
| |||
| |||
Re: Perl Script to help feed DBMS MySQL It's all very easy in perl, but what is (an order of magnitude) the size (number of lines) of a file .csv obtained by concatenation of those listed in a directory, in order to see the strategy most appropriate (while memory or not) to remove duplicates and sorting. |
#4
| |||
| |||
Re: Perl Script to help feed DBMS MySQL Quote:
1. Concatenate all *. csv * a directory into a single file 2. Remove all empty lines 3. The character is the delimiter column then delete column 2,8,9,10 & 11 4. Sort by the first column 5. Delete all duplicate lines 6. Replace all characters "by anything (double quote) Because otherwise we may have lines that are not duplicates before deleting columns which become then (I do not know the exact structure of data lines, so it implies) [or when the order initially it is important and can be duplicates after deleting columns?] Delete duplicates after sorting is optimal: the duplicates will then successive, which makes things easier. |
#5
| |||
| |||
Re: Perl Script to help feed DBMS MySQL As stated I started doing the script Via Agenda quote (I added in step 7 & 8) Code: #!/usr/bin/perl -w use strict; use warnings; # Global Variables my $LOCAL_DIR = "/cygdrive/d/Nicolas/VOD/_.billing/_.compiler/backup/test/"; my $DEST_DIR = "/cygdrive/d/Nicolas/VOD/_.billing/_.compiler/backup/"; my $SEARCH_FILE = "*_billing_tvod_*"; my $OUTPUT_FILE = $DEST_DIR . sprintf("RESULT_FILE.TXT" ); # Script Perl print "Starting...\n"; print "\n"; open (OUT,">$OUTPUT_FILE" ) or die "Message : $!\n"; my @RESULT_FILE = <$LOCAL_DIR/$SEARCH_FILE>; foreach my $LINE(@RESULT_FILE) { open IN,"$LINE" or warn "Message : $!\n"; while(<IN> ) { s/\"//g; next if m/^$/; my @SPLIT_TAB = split(/;+/); push(@SPLIT_TAB,substr($SPLIT_TAB[3],0,3)); chomp(@SPLIT_TAB); if($SPLIT_TAB[11] eq 'INA') { my $ERASE_COLUMN = pop(@SPLIT_TAB); push(@SPLIT_TAB,'INR'); chomp(@SPLIT_TAB); } elsif($SPLIT_TAB[11] eq 'INS') { my $ERASE_COLUMN = pop(@SPLIT_TAB); push(@SPLIT_TAB,'INR'); chomp(@SPLIT_TAB); } elsif($SPLIT_TAB[11] eq 'INJ') { my $ERASE_COLUMN = pop(@SPLIT_TAB); push(@SPLIT_TAB,'INR'); chomp(@SPLIT_TAB); } print OUT "$SPLIT_TAB[0] $SPLIT_TAB[2] $SPLIT_TAB[3] $SPLIT_TAB[4] $SPLIT_TAB[5] $SPLIT_TAB[6] $SPLIT_TAB[10] $SPLIT_TAB[11]\n"; #print OUT $_; } } print "\n"; print "Ending...\n"; END For task 3 deletion is not done because I'm selecting some columns, but I think it is not optimized. For task 4 I can not include it in the foreach so it'll have that I treat this in a second file from the 45/46 line is that it? Stupid question, how I leave my RESULT_FILE.TXT file I created? So to put it in a new table for the lot? For task 5 I found a function with an array of hash on the result but one of my columns can be seen truncated. I do not know if it's the function or fate previous: sub doublons_grep { my %hash_sans_doublon; return grep {! $hash_sans_doublon{$_}++ }@_; } To answer questions of CitricAcid: The order of magnitude is about 30000 lines. There is no restriction to the elimination of duplication before or after sorting, no matter. |
![]() |
|
Tags: database, dbms, mysql, perl, script |
Thread Tools | Search this Thread |
|
![]() | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to call perl script from Python? | Jhonwho | Software Development | 3 | 23-07-2009 09:59 PM |
Is it possible to execute Perl script within another script? | RasMus | Software Development | 2 | 21-07-2009 10:57 PM |
Need Perl script to execute from another. | beelow | Software Development | 2 | 20-06-2009 09:17 AM |
Title of a Perl CGI script | !const | Software Development | 2 | 26-03-2009 11:43 PM |
[DBMS] PostegreSQL and MySQL: whatever you want them? | Neil'o | Software Development | 3 | 18-03-2009 12:29 PM |