Go Back   TechArena Community > Software > Software Development
Become a Member!
Forgot your username/password?
Tags Active Topics RSS Search Mark Forums Read

Reply
 
Thread Tools Search this Thread
  #1  
Old 11-11-2009
Member
 
Join Date: Aug 2009
Posts: 56
Perl Script to help feed DBMS MySQL

I started in Perl and I am reading the doc to learn things. My problem is simple, I have multiple files that I just need to manipulate, format to feed my MySQL database. I use MySQL 5.1, Toad Perl v4.1 & v5.10

This is one example among many other CSV file:
Code:
245362;77;77289;INR69907;Felt;799;20091102232406;;VEL;799;66;24;
243640;77;66017;INT81553;"Notebook";699;20091102232318;;VEL;699;66;24;
236958;77;08;DRT03286;bed;299;20091102231945;;VEL;299;66;15;
319771;77;79060;INA38898;Pen;399;20091102231902;;VEL;399;66;3;
319771;77;79060;BTQ38898;Pen;399;20091102231902;;VEL;399;66;24;
319250;77;13110;INJ40768;Mattresses;599;20091102231736;;VEL;599;66;24;
320343;77;21758;PRP31628;Mice;299;20091102231459;;VEL;299;66;24;
331379;77;27209;INT59377;Kit;299;20091102231319;;VEL;299;66;15;
314147;77;89301;BTQ12922;Scissors;0;20091102231248;;VEL;0;66;24;
334467;77;26234;TEL38873;Handkerchief;499;20091102231131;;VEL;499;66;24;
241749;77;37732;TEL90191;Eraser;699;20091102231104;;VEL;699;66;24;
The desired result:
Code:
insert into vod_billing values("236958","8","DRT03286","bed","299","20091102231945","15","DRT" );
insert into vod_billing values("241749","37732","TEL90191","Eraser","699","20091102231104","24","TEL" );
insert into vod_billing values("243640","66017","INT81553","Notebook","699","20091102232318","24","INT" );
insert into vod_billing values("245362","77289","INR69907","Felt","799","20091102232406","24","INT" );
insert into vod_billing values("314147","89301","BTQ12922","Scissors","0","20091102231248","24","BTQ" );
insert into vod_billing values("319250","13110","INJ40768","Mattresses","599","20091102231736","24","INT" );
insert into vod_billing values("319771","79060","BTQ38898","Pen","399","20091102231902","24","BTQ" );
insert into vod_billing values("319771","79060","INA38898","Pen","399","20091102231902","3","INT" );
insert into vod_billing values("320343","21758","PRP31628","Mice","299","20091102231459","24","PRP" );
insert into vod_billing values("331379","27209","INT59377","Kit","299","20091102231319","15","INT" );
insert into vod_billing values("334467","26234","TEL38873","Handkerchief","499","20091102231131","24","TEL" );
it gives this:
1.concatenate all *.csv* a directory into a single file
2.replace all characters "by anything (double quote)
3.Remove all duplicate lines
4.Sortby the first column
5. Remove all empty lines
6. The character is the delimiter column then delete column 2,8,9,10 & 11
7. To add last column in the first 3 characters of the column 3 (TEL, PRI etc ...)
8. On this last column, replace all INA INJ & Rs per INT
9. Insert before the line vod_billing insert into values ( "
10. To include end of line ");
11. To replace the delimiter, for ""
12. Rename the output file .sql

The ultimate goal would be to directly feed my MySQL database created with lines, but already such a script would be perfect.

I know it's a bit abused but probably simpler for a developer!
Reply With Quote
  #2  
Old 11-11-2009
Member
 
Join Date: May 2008
Posts: 2,293
Re: Perl Script to help feed DBMS MySQL

There's a difference between helping someone who starts, and her spinning is all scripts. For referrals, I suggest you to go and read a little help on the functions open(), split(), close(). A quick foreach () will also help you.
Reply With Quote
  #3  
Old 11-11-2009
Member
 
Join Date: Nov 2008
Posts: 1,022
Re: Perl Script to help feed DBMS MySQL

It's all very easy in perl, but what is (an order of magnitude) the size (number of lines) of a file .csv obtained by concatenation of those listed in a directory, in order to see the strategy most appropriate (while memory or not) to remove duplicates and sorting.
Reply With Quote
  #4  
Old 11-11-2009
Member
 
Join Date: Nov 2008
Posts: 1,221
Re: Perl Script to help feed DBMS MySQL

Quote:
Originally Posted by KALANI84 View Post
it gives this:
1.concatenate all *.csv* a directory into a single file
2.replace all characters "by anything (double quote)
3.Remove all duplicate lines
4.Sortby the first column
5. Remove all empty lines
6. The character is the delimiter column then delete column 2,8,9,10 & 11
The correct order to do this is:
1. Concatenate all *. csv * a directory into a single file
2. Remove all empty lines
3. The character is the delimiter column then delete column 2,8,9,10 & 11
4. Sort by the first column
5. Delete all duplicate lines
6. Replace all characters "by anything (double quote)

Because otherwise we may have lines that are not duplicates before deleting columns which become then (I do not know the exact structure of data lines, so it implies) [or when the order initially it is important and can be duplicates after deleting columns?]

Delete duplicates after sorting is optimal: the duplicates will then successive, which makes things easier.
Reply With Quote
  #5  
Old 11-11-2009
Member
 
Join Date: Aug 2009
Posts: 56
Re: Perl Script to help feed DBMS MySQL

As stated I started doing the script Via Agenda quote (I added in step 7 & 8)

Code:
#!/usr/bin/perl -w
use strict;
use warnings;
# Global Variables
my $LOCAL_DIR = "/cygdrive/d/Nicolas/VOD/_.billing/_.compiler/backup/test/";
my $DEST_DIR = "/cygdrive/d/Nicolas/VOD/_.billing/_.compiler/backup/";
my $SEARCH_FILE = "*_billing_tvod_*";
my $OUTPUT_FILE = $DEST_DIR . sprintf("RESULT_FILE.TXT" );
# Script Perl
print "Starting...\n";
print "\n";
open (OUT,">$OUTPUT_FILE" ) or die "Message : $!\n";
my @RESULT_FILE = <$LOCAL_DIR/$SEARCH_FILE>;
foreach my $LINE(@RESULT_FILE)
{
  open IN,"$LINE" or warn "Message : $!\n";
  while(<IN> )
  {
      s/\"//g;
      next if m/^$/;
      my @SPLIT_TAB = split(/;+/);
      push(@SPLIT_TAB,substr($SPLIT_TAB[3],0,3));
      chomp(@SPLIT_TAB);
      if($SPLIT_TAB[11] eq 'INA')
      {
          my $ERASE_COLUMN = pop(@SPLIT_TAB);
          push(@SPLIT_TAB,'INR');
          chomp(@SPLIT_TAB);
      }
      elsif($SPLIT_TAB[11] eq 'INS')
      {
          my $ERASE_COLUMN = pop(@SPLIT_TAB);
          push(@SPLIT_TAB,'INR');
          chomp(@SPLIT_TAB);
      }
      elsif($SPLIT_TAB[11] eq 'INJ')
      {
          my $ERASE_COLUMN = pop(@SPLIT_TAB);
          push(@SPLIT_TAB,'INR');
          chomp(@SPLIT_TAB);
      }
      print OUT "$SPLIT_TAB[0] $SPLIT_TAB[2] $SPLIT_TAB[3] $SPLIT_TAB[4] $SPLIT_TAB[5] $SPLIT_TAB[6] $SPLIT_TAB[10] $SPLIT_TAB[11]\n";
      #print OUT $_;
  }
}
print "\n";
print "Ending...\n";
END
I managed tasks 1, 2, 6, 7 & 8 and probably 3.

For task 3 deletion is not done because I'm selecting some columns, but I think it is not optimized.

For task 4 I can not include it in the foreach so it'll have that I treat this in a second file from the 45/46 line is that it?

Stupid question, how I leave my RESULT_FILE.TXT file I created? So to put it in a new table for the lot?

For task 5 I found a function with an array of hash on the result but one of my columns can be seen truncated. I do not know if it's the function or fate previous:
sub doublons_grep {
my %hash_sans_doublon;

return grep {! $hash_sans_doublon{$_}++ }@_;
}

To answer questions of CitricAcid:
The order of magnitude is about 30000 lines.
There is no restriction to the elimination of duplication before or after sorting, no matter.
Reply With Quote
Reply

  TechArena Community > Software > Software Development
Tags: , , , ,



Thread Tools Search this Thread
Search this Thread:

Advanced Search


Similar Threads for: "Perl Script to help feed DBMS MySQL"
Thread Thread Starter Forum Replies Last Post
How to call perl script from Python? Jhonwho Software Development 3 23-07-2009 09:59 PM
Is it possible to execute Perl script within another script? RasMus Software Development 2 21-07-2009 10:57 PM
Need Perl script to execute from another. beelow Software Development 2 20-06-2009 09:17 AM
Title of a Perl CGI script !const Software Development 2 26-03-2009 11:43 PM
[DBMS] PostegreSQL and MySQL: whatever you want them? Neil'o Software Development 3 18-03-2009 12:29 PM


All times are GMT +5.5. The time now is 05:27 PM.