coopy-users Mailing List for coopy

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Paul,

Thanks for pointing that out. I think this last output is more natural
(intuitive) looking than the previous one.

cheers,

Joe

On Mon, Nov 22, 2010 at 10:03 PM, Paul Fitzpatrick
<pau...@al...> wrote:
>
> Another quick follow up.  I upgraded the row matching algorithm to fall back
> on a more powerful (if slightly slower) method when the existing method
> isn't making convincing progress.  The human-readable diff for your example
> is now:
>
> dtbl: human-readable table difference format version 0.3
>
> column names are: COLUMN1 COLUMN2 COLUMN3
>
> update row:
>  where COLUMN1,COLUMN3 = 1111,2
>  set   COLUMN2 = 1111 -> xxxx
>
> delete row:
>  remove 4444 4444 1
>
> delete row:
>  remove 4444 4444 2
>
> insert row:
>  add 2222 2222 1
>
> insert row:
>  add 5555 5555 2
>
> update row:
>  where COLUMN1,COLUMN3 = 6666,1
>  set   COLUMN2 = 6666 -> xxxx
>
> Cheers,
> Paul
>
> On 11/21/2010 05:40 PM, Paul Fitzpatrick wrote:
>>
>> Hi Joe,
>>
>> Thanks for posting this.  Your test case highlighted a few problems with
>> COOPY.
>>
>>   * The omitted row 1111,1111,1 was a flat out bug.  I've committed a
>> fix for that bug, and added this case to the regression tests - thank
>> you!  With the fix, an ssdiff-sspatch sequence at least produces the
>> expected result.
>>
>>   * COOPY currently has trouble when there are sets of rows that have no
>> real distinguishing characteristics.  Your "local" csv file is
>> difficult, since there are pairs of rows that differ only by a single
>> isolated digit.  This is why the "diff" given involves basically
>> deleting the original file and inserting the new one.
>>
>> To your question of how COOPY aligns/joins the rows from the two
>> tables.  For your case, it fails to, so this is hypothetical :-).
>> However, here's a brief sketch of the procedure.
>>
>> * We take three tables, P, L, and R.  L is your local table, R is your
>> remote table, P is a pivot/parent table which for ssdiff is by default
>> equal to L.
>> * We try to recover a mapping from rows in L to rows in P.  For the diff
>> case, it is trivial, L=P.
>> * We try to recover a mapping from rows in P to rows in R.  Columns may
>> have been added/deleted/reordered/renamed/garbled, so the process is,
>> for each row in one table, to take all string fragments of text up to a
>> threshold length, and dump them into a hash table (tagged with their
>> origin).  String fragments that appear in multiple rows get discounted.
>> For each row in the the second table, we accumulate hits against the
>> hash table, then decide on whether a match has been achieved.
>> * Once rows are matched, we look at mapping from columns in P to columns
>> in R.  The process here is similar, if simpler.
>> * The mapping from L to R is determined via P - for ssdiff, this is
>> trivial.
>>
>> The procedure is ironically particularly prone to failure on artificial
>> test cases with small numbers of columns and rows.  However, I expect at
>> least your test case should be handled soon, through an iterative step
>> where row mappings are re-estimated after column mappings have been fixed.
>>
>> Cheers,
>> Paul
>>
>>
>>>
>>> Hello,
>>>
>>> [COOPY 0.4.0 running on OS X.6]
>>>
>>> I'm trying to understand the results from ssdiff. I have two csv files:
>>>
>>> local:
>>>
>>> COLUMN1,COLUMN2,COLUMN3
>>> 1111,1111,1
>>> 1111,1111,2
>>> 4444,4444,1
>>> 4444,4444,2
>>> 6666,6666,1
>>> 6666,6666,2
>>>
>>> modified:
>>>
>>> COLUMN1,COLUMN2,COLUMN3
>>> 1111,1111,1
>>> 1111,xxxx,2
>>> 2222,2222,1
>>> 5555,5555,2
>>> 6666,xxxx,1
>>> 6666,6666,2
>>>
>>> If I run this:
>>>
>>> ssdiff --format-human local.csv modified.csv
>>>
>>> I get this:
>>>
>>> column names are: COLUMN1 COLUMN2 COLUMN3
>>>
>>> delete row:
>>>   remove 1111 1111 1
>>>
>>> delete row:
>>>   remove 1111 1111 2
>>>
>>> delete row:
>>>   remove 4444 4444 1
>>>
>>> delete row:
>>>   remove 4444 4444 2
>>>
>>> delete row:
>>>   remove 6666 6666 1
>>>
>>> delete row:
>>>   remove 6666 6666 2
>>>
>>> update row:
>>>   where COLUMN1,COLUMN2,COLUMN3 = COLUMN1,COLUMN2,COLUMN3
>>>   set    =
>>>
>>> insert row:
>>>   add 1111 xxxx 2
>>>
>>> insert row:
>>>   add 2222 2222 1
>>>
>>> insert row:
>>>   add 5555 5555 2
>>>
>>> insert row:
>>>   add 6666 xxxx 1
>>>
>>> insert row:
>>>   add 6666 6666 2
>>>
>>>
>>> I don't quite understand those results. Why was this row deleted,
>>> without being added back?
>>>
>>> 1111,1111,1
>>>
>>> It appears on both sides. In general, how does COOPY align (join) the
>>> rows from the two tables.
>>>
>>> cheers,
>>>
>>> Joe
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Beautiful is writing same markup. Internet Explorer 9 supports
>> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2&  L3.
>> Spend less time writing and  rewriting code and more time creating great
>> experiences on the web. Be a part of the beta today
>> http://p.sf.net/sfu/msIE9-sfdev2dev
>> _______________________________________________
>> Coopy-users mailing list
>> Coo...@li...
>> https://lists.sourceforge.net/lists/listinfo/coopy-users
>>
>
>

coopy-users Mailing List for coopy

coopy-users — Users of Coopy software