Results 1 to 6 of 6

Thread: Choice of data structure

  1. #1
    Join Date
    Jul 2009
    Posts
    122

    Choice of data structure

    I'm going to redo much of my work because I have chosen the wrong data structure for the access files that are more expensive. All my work is based on files of the same results intermediaries. The purpose of my job is to find a solution to my problem but to underestimate the possible execution time (the measurement time by using the 'clock ()'.

    But when I tested the time of execution on the comparison of two files according to defined criteria then I noticed that the time has become very long if we have files with hundreds of lines. We will read each line of file 1 and compared with all lines of file 2. It's very expensive.

    Do you:
    - With what data structures these two files will be replaced?

    - In general, what is the data structure the least costly, memory and execution time so the lowest possible?

  2. #2
    Join Date
    Nov 2008
    Posts
    1,054

    Re: Choice of data structure

    Can you give us even more details? I mean with the information that you have given, it is harder to determine what database structure would be feasible and best suited for you and for your project.

  3. #3
    Join Date
    Jul 2009
    Posts
    122

    Re: Choice of data structure

    In my C program, the comparison function is:

    Code:
    *rets = compare_files("f.txt", "f2.txt" );
    I passed two files "e.txt" and "f2.txt" to function 'compare_files'. The size of "e.txt" is always greater than "f2.txt". Each line contains two files for a string.

    I am looking for lines that belong to "e.txt" and not "f2.txt". It is a kind of difference: lines of "e.txt" minus lines of "f2.txt"; provided a line of "e.txt" is identical to a line of "f2.txt". If both lines have the same value and the same number of words that form the two lines regardless of the order of words since the word order is not important in my problem. The most important thing: the same value and the same number

    If that is the two lines have the same value and the same number of words then in this case the two lines are different.

    For example:
    "name surname age" = "name age surname"

    Knowing that each line of the two files "e.txt" and "f2.txt" is composed of a single field (a string)

    Be the first file "e.txt"
    name surname
    Name Age
    NAME AGE
    first_name Employment
    name surname age
    Job name
    age employment
    first_name age employment
    Name Age Employment
    name surname Employment
    name surname age employment

    Either the second file "f2.txt"

    name
    first_name
    age
    Employment
    age name
    name surname age
    Job name
    age employment
    first_name age employment
    name surname Employment

    Applying the principle of comparison when this is achieved through:
    name surname
    Name Age Employment
    name surname age employment

    Then, we keep this result that the lines which does not contain another string with the result.

    Here, we do not keep the string "name surname age employment" because it contains the string "name surname"

    The desired end result is obtained:
    name surname
    Name Age Employment

    Here is the function code 'compare_files'

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #define OUTPUT_NAME "finalresult.txt"
    #define MAX_SIZE 1024
    void * xrealloc(void * prec, size_t len)
    {
        void *ret = realloc(prec, len);
        if(!ret) 
            exit(0xDEAD);
        return ret;
    }
    
    char * mstrndup(const char *str, size_t n)
    {
        char *ret = malloc((n+1) * sizeof(char));
        if (!ret)
            exit(0xDEAD);
        strncpy(ret, str, n); 
        ret[n] = 0; 
        return ret;
    }
    
    void free_tab(char **t, size_t len)
    {
        if (t)
        {
            size_t i;
            for (i = 0; i < len; i++)
                free(t[i]);
            free(t);
        }
    }
    
    void free_tab2(char **t)
    {
        size_t i = 0;
        while (t && t[i])
            free(t[i++]);
        free(t);
    }
    
    size_t count_word(const char *str)
    {
        size_t n = 0;
        int a;
        while (*str) 
        {
            a = 0; 
            while (isalpha((unsigned char) *str) && *str) str++, a = 1; 
            if (a) n++; 
            while (!isalpha((unsigned char) *str) && *str) str++; 
        }
        return n;
    }
    
    void get_word(char **tab, const char *str)
    {
        const char* p = str;
        int a, i = 0;
        while (*str)
        {
            a = 0;
            while (isalpha((unsigned char) *p) && *p) p++, a = 1;
            if (a)
                tab[i++] = mstrndup(str, p-str); 
            while (!isalpha((unsigned char) *p) && *p) p++;
            str = p;
        }
    }
    
    int compareline(char **t1, size_t size1, char **t2, size_t size2)
    {
        int ret = 1;
        size_t i, j;
        int a;
        for (i = 0; i < size1; i++)
        {
            a = 0;
            for (j = 0; j < size2; j++)
                if (!strcmp(t1[i], t2[j]))
                {
                    a = 1;
                    break; 
                }
            if (!a)
            {
                ret = 0;
                break; 
            }
        }
        return ret;
    }
    
    int is_same(const char *s1, const char *s2, int comparesize)
    {
        char **t1, **t2;
        size_t size1 = count_word(s1), size2 = count_word(s2);
        int ret = 0;
        if (!comparesize || (size1 > size2)) 
        {
            t1 = malloc(size1 * sizeof(char*));
            t2 = malloc(size2 * sizeof(char*));
            if (t1 && t2)
            {
                get_word(t1, s1);
                get_word(t2, s2);
                ret = comparesize? compareline(t2, size2, t1, size1) : compareline(t1, size1, t2, size2);
                free_tab(t1, size1), free_tab(t2, size2);
            }
            else
                exit(0xDEAD);
        }
        return ret;
    }
    char ** compare_files(const char *filename1, const char *filename2)
    {
        FILE *f = fopen(filename1, "r" ), *f2 = fopen(filename2, "r" );
        char s[MAX_SIZE], s2[MAX_SIZE];
        int a, retsize = 0;
        char **ret = NULL;
        if (f && f2) 
        {
            while (fgets(s, MAX_SIZE, f)) 
            {
                a = 0;
                rewind(f2);
                while (fgets(s2, MAX_SIZE, f2))
                    if (is_same(s, s2, 0))
                    {
                        a = 1;
                        break;
                    }
                if (!a)
                {
                    ret = xrealloc(ret, (++retsize) * sizeof(char*)); 
                    ret[retsize-1] = mstrndup(s, strlen(s));
                }
            }
            fclose(f), fclose(f2);
            ret = xrealloc(ret, (++retsize) * sizeof(char*)); 
            ret[retsize-1] = NULL;
        }
        else 
        {
            if (f) fclose(f);
            if (f2) fclose(f2);
        }
        return ret;
    }
    void write_tab(FILE *f, char **t, size_t len)
    {
        size_t i;
        for (i = 0; t && i < len; i++)
            if (t[i])
                fprintf(f, t[i]);
    }
    void disp_tab(char **t)
    {
        size_t i;
        for (i = 0; t && t[i]; i++)
            printf(t[i]);
    }
    int main(void)
    {
    #define MAXFILE 3
    #define MAXTMP 10
    #define N 4
        char **rets[MAXFILE] = {NULL}, tmp[MAXTMP];
        char **interret = NULL;
        size_t interret_size = 0;
        int i,j,k;
        int a, b;
        int equN = 0, equN1 = 0, equothr = 0, equi = 0;
        FILE *output;
        *rets = compare_files("f.txt", "f1.txt" );
        i = 0;
        while(*rets && (*rets)[i]) 
        {
            a = count_word((*rets)[i]); 
            if(a == N) 
              equN++, equi = i;
            else if (a == (N-1)) 
              equN1++;
            else if (a < (N-1))
              equothr = 1; 
            i++;
        }
        if(equN == 1 && !equothr) 
        {
            if(equN1) 
            {
                interret = *rets; 
                interret_size = i; 
                free((*rets)[equi]), (*rets)[equi] = NULL; 
            }
            else 
            {
              interret = *rets;
              interret_size = 1;
              for(i = 1; rets[0][i]; i++)
                free(rets[0][i]), rets[0][i] = NULL;
            }
        }
        else
        {
          for (i = 1; i < MAXFILE; i++) 
          {
              sprintf(tmp, "f%d.txt", i+1);
              rets[i] = compare_files("f.txt", tmp); 
          }
          i = 0;
          while (rets[0] && rets[0][i]) 
          {
              a = b = 0;
              for (j = 1; j < MAXFILE; j++) 
              {
                  k = 0;
                  while (rets[j] && rets[j][k]) 
                  {
                      if (!strcmp(rets[0][i], rets[j][k])) 
                      {
                          a = 1; 
                          break;
                      }
                      k++;
                  }
                  if (!a) 
                  {
                      b = 1;
                      break;
                  }
              }
              if (!b) 
              {
                  interret = xrealloc(interret, (++interret_size) * sizeof(char*));
                  interret[interret_size-1] = mstrndup(rets[0][i], strlen(rets[0][i])); 
              }
              i++;
          }
          for (i = 0; i < MAXFILE; i++) 
              free_tab2(rets[i]);
          for (i = 0; i < interret_size; i++) 
          {
              for (j = 0; interret[i] && j < interret_size; j++) 
              {
                  if (interret[j] && j != i) 
                      if (is_same(interret[i], interret[j], 1)) 
                      {
                          free(interret[i]), interret[i] = NULL; 
                          break;
                      }
              }
          }
        }
        if((output = fopen(OUTPUT_NAME, "w+" )) != NULL) 
        {
            write_tab(output, interret, interret_size);
            fclose(output);
        }
        free_tab(interret, interret_size); 
        return 0;
    }
    What would you think?

  4. #4
    Join Date
    Nov 2008
    Posts
    1,054

    Re: Choice of data structure

    My friend, simply write the main functions instead of showing the whole code which is useless to us. And by the way what is interret of your functions that wrap your libc?

  5. #5
    Join Date
    Jan 2006
    Posts
    3,792

    Re: Choice of data structure

    Your files are how big? Is this you can keep them in memory rather than to reread n1 * n2 times? Is what you could not replace your strings with stuff that would represent a unique (checksums perhaps?) and that the comparison would be faster? (should ask someone other than myself to these aspects)

  6. #6
    Join Date
    Jul 2009
    Posts
    122

    Re: Choice of data structure

    @ big fish: When I measure the execution time of this treatment comparison using the 'clock ()' then I get a higher time even thousands of seconds for a file of hundreds of lines.

    Code:
    debut = clock();
    //Comparison between two files
    fin = clock();
    fprintf(stderr, "temps : %f\n", (double)(fin-debut) / (double) CLOCKS_PER_SEC);
    The file may contain hundreds of lines and each line contains a string (all words). I wanted to make them fit in memory but how to do this? and with what data structure we can do this?

Similar Threads

  1. What is linear search in data structure?
    By Venugopala in forum Software Development
    Replies: 3
    Last Post: 04-01-2011, 08:11 AM
  2. What is the use of insertion sort in data structure?
    By Venugopala in forum Software Development
    Replies: 3
    Last Post: 04-01-2011, 02:41 AM
  3. What do you mean by stacks in Data structure
    By Shiva$m in forum Software Development
    Replies: 3
    Last Post: 04-01-2011, 12:20 AM
  4. What is the binary search in data structure
    By Alibamu in forum Software Development
    Replies: 3
    Last Post: 03-01-2011, 08:42 AM
  5. Variables and data structure in Perl
    By Monty1 in forum Tips & Tweaks
    Replies: 2
    Last Post: 28-09-2010, 07:53 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Page generated in 1,714,061,358.76030 seconds with 17 queries