Delete / Löschen
The Frog
09.12.2010 - 10:27

Tossing an idea around.....

Hi Everyone,

I am playing with a db design at the moment and an idea has occurred
to me that I would like to ask for advice / ideas on. The db is for
storing product information in a 'dual heirarchy' way. What this means
is that we have a normalised design with two 'paths' to a specific
product (Category, SubCategory, Segment)(Manufacturer, Brand, Range).
Each 'Product' has a unique id (the EAN code) as well as a unique
technical key (autonumber), and has two foreign keys (range_id and
segment_id). There are some standard fields that are applied to all
products, such as Name, Weight, Height, Width, Depth, and so on. There
are also different fields that would be great to be able to apply to
certain categories of product, for example food products would benefit
from having a 'Flavour' field, while books would benefit from an
'Author' and 'ISBN' field.

What I was thinking of doing was to have a user defined list of
appropriate fields and their data types, per category, so that
appropriate information can be stored for each without having to build
separate data tables per category. I have not tried something like
this before and was wondering if anyone has given it a go?

My initial thoughts run along the lines of having a table related to
the category table where the field list would live. What I cannot seem
to figure out is how to use SQL to return a list of products, with the
user defined fields (and their values of course), for a given category
so that each user defined field 'has a column of its own' so to speak.
I think this should be possible but I just have no luck in wrapping my
mind around a method to achieve this.

Any advice or ideas would be greatly appreciated.

The Frog

Roger
09.12.2010 - 12:22
On Dec 9, 2:270am, The Frog <mr.frog.to....@googlemail.com> wrote:
Hi Everyone,

I am playing with a db design at the moment and an idea has occurred
to me that I would like to ask for advice / ideas on. The db is for
storing product information in a 'dual heirarchy' way. What this means
is that we have a normalised design with two 'paths' to a specific
product (Category, SubCategory, Segment)(Manufacturer, Brand, Range).
Each 'Product' has a unique id (the EAN code) as well as a unique
technical key (autonumber), and has two foreign keys (range_id and
segment_id). There are some standard fields that are applied to all
products, such as Name, Weight, Height, Width, Depth, and so on. There
are also different fields that would be great to be able to apply to
certain categories of product, for example food products would benefit
from having a 'Flavour' field, while books would benefit from an
'Author' and 'ISBN' field.

What I was thinking of doing was to have a user defined list of
appropriate fields and their data types, per category, so that
appropriate information can be stored for each without having to build
separate data tables per category. I have not tried something like
this before and was wondering if anyone has given it a go?

My initial thoughts run along the lines of having a table related to
the category table where the field list would live. What I cannot seem
to figure out is how to use SQL to return a list of products, with the
user defined fields (and their values of course), for a given category
so that each user defined field 'has a column of its own' so to speak.
I think this should be possible but I just have no luck in wrapping my
mind around a method to achieve this.

Any advice or ideas would be greatly appreciated.

The Frog

so you have something like

tblRange
rangeId (pk)
...


tblSegment
segmentId (pk)
...

tblProduct
productId (pk)
ean
rangeId (fk)
segmentId (fk)

now you need to introduce 'features'

tblRangeFeature
featureId (pk)
rangeId (pk)
feature

tblSegmentFeature
featureId (pk)
segmentId (fk)
feature

tblProductRangeFeature
prfId (pk)
productId (fk)
rangeId (fk)
featureId (fk)
featureValue

tblProductSegmentFeature
psfId (pk)
productId (fk)
segmentId (fk)
featureId (fk)
featureValue

The Frog
09.12.2010 - 14:17
On the right track Roger, but I was thinking a little simpler:

tblRange:
-range_id (pk)
-range

tblSegment:
-segment_id (pk)
-segment

tblProduct:
-product_id (pk)
-ean (unique)
-product
-range_id (fk)
-segment_id (fk)

tblFeature:
-feature_id (pk)
-category_id (fk) (this is two levels 'above' range in the design)
-feature

In theory we can list all sorts of things here. My concern is in two
parts:
1) How to effectively control the set of features that are available
to an individual product without cascading a compound primary key down
the line (so to speak).
2) How to provide a query that for any given category we would see a
list of products with their 'heirarchical' fields as well as including
- column by column - all the defined features (ie/ each feature as a
separate field in the query).

I need to be (if I am going to do this) provide the output query as
pure SQL. It cannot be dependant on any VBA to work as the database
must 'play nicely' with some of our other systems. If I have to carry
a compound key then so be it, then use a joining table to simulate a
many-many relationship.

Does that clarify what I am trying to achieve a little better? I
apologise for any lack of clarity in my earlier post.

Cheers

The Frog


"Clif McIrvin"
09.12.2010 - 14:54
"The Frog" <mr.frog.to.you@googlemail.com> wrote in message
news:2b226053-0a6f-4acf-8834-abe5b9a492b3@y23g2000yqd.googlegroups.com...
Hi Everyone,

<snip>

What I was thinking of doing was to have a user defined list of
appropriate fields and their data types, per category, so that
appropriate information can be stored for each without having to build
separate data tables per category. I have not tried something like
this before and was wondering if anyone has given it a go?

<snip>
Any advice or ideas would be greatly appreciated.

The Frog


An entry I read recently in Tony's Microsoft Access Blog comes to mind
...

http://msmvps.com/blogs/access/archive/2010/10/07/bad-carma.aspx


WOW! If you don't read Bad CaRMa to the end and don't say wow (or
worse) then please unsubscribe from my blog. <smile>

I followed a link from fellow MVP Mitch Wheat's blog entry When Design
Goes Bad, and The 'One table to Rule Them All!'


--
Clif McIrvin

(clare reads his mail with moe, nomail feeds the bit bucket :-)



Roger
09.12.2010 - 17:09
On Dec 9, 6:170am, The Frog <mr.frog.to....@googlemail.com> wrote:
On the right track Roger, but I was thinking a little simpler:

tblRange:
-range_id (pk)
-range

tblSegment:
-segment_id (pk)
-segment

tblProduct:
-product_id (pk)
-ean 0 0 0 0 0 (unique)
-product
-range_id 0 0(fk)
-segment_id (fk)

tblFeature:
-feature_id 0(pk)
-category_id (fk) 0 0(this is two levels 'above' range in the design)
-feature

In theory we can list all sorts of things here. My concern is in two
parts:
1) How to effectively control the set of features that are available
to an individual product without cascading a compound primary key down
the line (so to speak).
2) How to provide a query that for any given category we would see a
list of products with their 'heirarchical' fields as well as including
- column by column - all the defined features (ie/ each feature as a
separate field in the query).

I need to be (if I am going to do this) provide the output query as
pure SQL. It cannot be dependant on any VBA to work as the database
must 'play nicely' with some of our other systems. If I have to carry
a compound key then so be it, then use a joining table to simulate a
many-many relationship.

Does that clarify what I am trying to achieve a little better? I
apologise for any lack of clarity in my earlier post.

Cheers

The Frog

1)

tblFeature gives you a list of possible features for a product
ie. metadata
you could add fields like sequencing, mandatory flags, data type,
default value, etc


now you need a table, tblProductFeature, to assign values
to the features for a given product, ie. the feature data

tblProductFeature:
-pf_id (pk)
-product_id (fk)
-feature_id (fk)
-feature_value

2)
this is a hard one without VBA,
if it was a small set of known features, you could a query like this
pseudo query

select a.productId
, a.product
, b.featureValue as width
, c.featureValue as height
from tblProduct as a left join
tblProductFeature as b on a.productId D b.productId
and b.featureId D 1 left join
tblProductFeature as c on a.productId D c.productId
and c.featureId D 2


as you can see, you need a 'tblProductFeature' reference for every
feature
maybe you can have one query per category (assuming each category has
a
small set of features)


The Frog
10.12.2010 - 09:08
Roger,

Thankyou for the SQL approach, it is exactly what i was looking for.
What I will do is add code to the form that allows users to 'add a
feature' to a category, and use VBA to generate the required SQL and
save it as a query specific to each category. Depending on the number
of records and features it might pay me to dump the results to a temp
table. Either as a query or a temp table the 'play nice with other
applications' part is taken care of.

Clif, thats an extremely funny article on developing the 'Vision'
application. I cant believe that anyone would be that stupid!
Apparently ego and ambition outweighed ability and common sense.
Sounds similar to my office........ Wait! Did I say that out loud?????

Thanks guys.

The Frog

imb
10.12.2010 - 12:19
On Dec 10, 9:080am, The Frog <mr.frog.to....@googlemail.com> wrote:
Roger,

Thankyou for the SQL approach, it is exactly what i was looking for.
What I will do is add code to the form that allows users to 'add a
feature' to a category, and use VBA to generate the required SQL and
save it as a query specific to each category. Depending on the number
of records and features it might pay me to dump the results to a temp
table. Either as a query or a temp table the 'play nice with other
applications' part is taken care of.

Clif, thats an extremely funny article on developing the 'Vision'
application. I cant believe that anyone would be that stupid!
Apparently ego and ambition outweighed ability and common sense.
Sounds similar to my office........ Wait! Did I say that out loud?????

Thanks guys.

The Frog

Hi The Frog,

If a little bit of VBA is acceptable, you could make a csv-output,
readable with Excel, and (probably) 'other applications'.

Imb

The Frog
10.12.2010 - 16:55
Hi imb,

Thanks for the thought. I am guessing that the IT dept here wants to
be able to get at the data without having to write any complex SQL
themselves. If I can get the userform to craft the needed SQL and save
it as a query in the back end file, then I am hoping that the IT dept
can then access it any way they wish. Basically they dont want to play
with a front end app at all and just want access to the data that is
going to be the product of a lot of user input.

The bigger picture is that the whole excercise is a data cleaning
process to produce a table of products that will have 'trusted'
definitions for their fields as well as the content of those fields.
Each row of data, and each field in those rows needs to be manually
approved. This is a big job indeed for the product lists we have, so I
am trying to ease the pain with a little application that can take the
source 'rough' data in, park it in temp tables, allow the users to
process the data, and place the result into a structured back end that
still leaves them some flexibility to alter things as the business
changes and grows. It will eventually move to a master data system and
the process will be discarded, but not until the existing lists are
processed and cleaned to a trustworthy state. It is proving to be a
surprisingly tedious and frustrating problem to solve, but by no means
impossible.

I really appreciate the feedback.

Cheers

The Frog

imb
10.12.2010 - 17:47
On Dec 10, 4:550pm, The Frog <mr.frog.to....@googlemail.com> wrote:
Hi imb,

Thanks for the thought. I am guessing that the IT dept here wants to
be able to get at the data without having to write any complex SQL
themselves. If I can get the userform to craft the needed SQL and save
it as a query in the back end file, then I am hoping that the IT dept
can then access it any way they wish. Basically they dont want to play
with a front end app at all and just want access to the data that is
going to be the product of a lot of user input.

The bigger picture is that the whole excercise is a data cleaning
process to produce a table of products that will have 'trusted'
definitions for their fields as well as the content of those fields.
Each row of data, and each field in those rows needs to be manually
approved. This is a big job indeed for the product lists we have, so I
am trying to ease the pain with a little application that can take the
source 'rough' data in, park it in temp tables, allow the users to
process the data, and place the result into a structured back end that
still leaves them some flexibility to alter things as the business
changes and grows. It will eventually move to a master data system and
the process will be discarded, but not until the existing lists are
processed and cleaned to a trustworthy state. It is proving to be a
surprisingly tedious and frustrating problem to solve, but by no means
impossible.

I really appreciate the feedback.

Cheers

The Frog

Hi The Frog,

With the explanation in your answer I can understand that an output to
a csv-file is to simple. As far I understand now is that you want to
update (changed) data in the database, so in fact your form is
"connected" to the data-tables.

If this is the case, then your form must handle 0nd real data 0nd meta
data (fieldnames, datatypes, etc).
I have a feeling that this is possible. In fact for all continuous
forms that I have in my applications, I use the same form. This form
contains no knowledge of real data or meta data, that is only done at
opening the form, and is very dependant on the specific environment of
that form on that moment.

So, if I have a better idea of how your application (or is it still an
idea?) runs, perhaps I can give you some hints how I solved such a
problem.

Imb.



The Frog
13.12.2010 - 09:30
Hi Imb,

That is a most generous offer. At the moment the application is still
in the 'paper' stage, but there is a need to build this. In short it
forms part of a larger process, primarily that of generating a type of
trusted data dimension for product master data. In turn this master
data is used as the base of several data warehouse applications and
systems. The current issue is one of data cleanliness. The input is
basically a 'minimum effort' system that is almost wholly dependant on
an external data supplier who is not beholden to us in any way for
data quality (just for the record I was not the one who negotiated
this suppliers contract with us).

So, the idea is to take an ETL process (Extract Transform and Load)
and validate and restructure the data into a form that matches the
business needs. This will be done by Pentaho Data Integration and it
does its job well. What it cant do is tell you if something is 'real'
or not. For example, does Brand X really exist or is it just a
mispelled Brand Y? You can get close with heuristic lookups and
comparisons, and I will be using Levenshtein distances etc to find
nearest fits when a fields value is unknown - but this is only a
guess. In the end a user must make the decision and 'approve' the
fields value. This can end up being a lot of work, so I need to do
some shortcutting to make life easier or the system just wont be used.
This is where the user controls start to come into play.

For each category of product(s) there are necessary 'features' that
are pretty much standard, such as EAN codes or part numbers, a name,
manufacturer etc... So with this in mind I have built a series of
normalised tables to represent the standard components of most any
product:

Category->SubCategory->Segment \
Product->Variety
Manufacturer-> Brand-> Range /

Attached to each of these tables (except for Variety) is what I call a
DS table. A DS table is short for DictionarySource. It is a table that
uses the PK of say Category as a FK, has a field for identifying the
data source (like a point of origin), a field for the 'incoming' term
from that data source. In this way known terms can be matched quickly
and the ETL processing can handle more and more of the data processing
'automatically' so to speak. The result of the ETL will be a list of
'incoming' rows with the source field terms and the known (or guessed)
'output' terms. This is where the user defined part is bogging me
down. Handling the normalised structred part is relatively straight
forward and logical. Allowing the users to define fields per category
that are applied at the Variety, and matching those fields to incoming
fields that are available is proving tricky. I figured that a meta
data approach would be about the only way to do this, so I threw the
idea up here on the newsgroup to kick it around and see what came of
it.

Now that you have the full picture is there anything you might suggest
to ease the pain?

Cheers

The Frog

imb
14.12.2010 - 00:28
On Dec 13, 9:300am, The Frog <mr.frog.to....@googlemail.com> wrote:
Hi Imb,

That is a most generous offer. At the moment the application is still
in the 'paper' stage, but there is a need to build this. In short it
forms part of a larger process, primarily that of generating a type of
trusted data dimension for product master data. In turn this master
data is used as the base of several data warehouse applications and
systems. The current issue is one of data cleanliness. The input is
basically a 'minimum effort' system that is almost wholly dependant on
an external data supplier who is not beholden to us in any way for
data quality (just for the record I was not the one who negotiated
this suppliers contract with us).

So, the idea is to take an ETL process (Extract Transform and Load)
and validate and restructure the data into a form that matches the
business needs. This will be done by Pentaho Data Integration and it
does its job well. What it cant do is tell you if something is 'real'
or not. For example, does Brand X really exist or is it just a
mispelled Brand Y? You can get close with heuristic lookups and
comparisons, and I will be using Levenshtein distances etc to find
nearest fits when a fields value is unknown - but this is only a
guess. In the end a user must make the decision and 'approve' the
fields value. This can end up being a lot of work, so I need to do
some shortcutting to make life easier or the system just wont be used.
This is where the user controls start to come into play.

For each category of product(s) there are necessary 'features' that
are pretty much standard, such as EAN codes or part numbers, a name,
manufacturer etc... So with this in mind I have built a series of
normalised tables to represent the standard components of most any
product:

Category->SubCategory->Segment 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 =
0 0 0 0 0 0 0 0 0Product->Variety
Manufacturer-> 0 Brand-> 0 0Range 0 /

Attached to each of these tables (except for Variety) is what I call a
DS table. A DS table is short for DictionarySource. It is a table that
uses the PK of say Category as a FK, has a field for identifying the
data source (like a point of origin), a field for the 'incoming' term
from that data source. In this way known terms can be matched quickly
and the ETL processing can handle more and more of the data processing
'automatically' so to speak. The result of the ETL will be a list of
'incoming' rows with the source field terms and the known (or guessed)
'output' terms. This is where the user defined part is bogging me
down. Handling the normalised structred part is relatively straight
forward and logical. Allowing the users to define fields per category
that are applied at the Variety, and matching those fields to incoming
fields that are available is proving tricky. I figured that a meta
data approach would be about the only way to do this, so I threw the
idea up here on the newsgroup to kick it around and see what came of
it.

Now that you have the full picture is there anything you might suggest
to ease the pain?

Cheers

The Frog

Hi The Frog,

I have printed your answer, so I can reread it a couple of times, in
order to understand what you mean.
But every time I fall over the _2incoming2 term_ and _2output2
terms_. According to my dictionary the word term has many meanings,
and I do not know whichoine to choose.

But your problem reminds me of an application that I build for a
distribution company.
They wanted to have a database with the (selected) products of the
suppliers. The source for this information was mostly Excel-sheets or
Word-documents.
As in your case, the products had a couple of pretty standard
attributes, but also many different features, that could be very
different for different products. This could be the color, length,
unit size, etc., with hardly any structure in featurename, especially
not between the different suppliers.
To handle all these different features I had a table with Product_id
(FK), Featurename, Featurevalue. To bring some structure in all
different Featurenames I added a field CompanyFeaturename_id, FK to a
table with the defined CompanyFeaturenames.
At loading a new catalogus from a supplier, Featurenames and
Featurevalues were filled in, but not yet the (standardized)
CompanyFeaturename_id. This empty CompanyFeaturename_id indicates that
some human interaction has to be done.
I can imagine that a CompanyFeaturename_id can already be produced by
a reference table with the relations between Featurename and
CompanyFeaturename form the past, or a best guess with whatever means.
In the latter case you probably need an additional boolean field
3checked4.

Is this comparable with your problem, or am I on a wrong track?


Imb.

The Frog
14.12.2010 - 09:20
Hi imb,

You are on the right track. The 'incoming' data is referring to the
source information that comes from a third party supplier and is the
data that is going to be processed. The 'output' is the result of the
process or to put it another way the definition of a product itself
and its features. The idea you have suggested is along the lines I
would need to implement, and I will experiment with it later today.
Thankyou for the guidance.

In doing some further research into the product definitions I have
learned that a lot of fields can be added directly to either the
product table or the variety table, and simply using a query to return
the fields needed for a category (ignoring fields we dont need).

I am wondering if I might approach the problem differently, by
defining many many fields for the product and variety tables, and
simply having another table that allows the users to select which
fields belong to each category. Its not perfect but it should be much
faster and easier to implement. Some human decision making is still
involved but it is minimal. It also makes target fields for ETL
processes much better defined and therefore less prone to error,
especially when it comes to handling data types.

I will some some experimenting later today and let you know the
results.

Cheers

The Frog

Tony Toews
15.12.2010 - 02:42
On Thu, 9 Dec 2010 01:27:58 -0800 (PST), The Frog
<mr.frog.to.you@googlemail.com> wrote:

What I was thinking of doing was to have a user defined list of
appropriate fields and their data types, per category, so that
appropriate information can be stored for each without having to build
separate data tables per category.

I'd sure be tempted to have lots of empty fields in the Item table.
With a field on the category table stating what type of fields should
be displayed. Then create subforms and subreports on the type of
fields. Then make visible the appropriate subforms and subreports
depending on the category type.

Yes, this is the standard boring approach. But when you consider
you've got to take into account field widths and order on the subforms
and subreports, along with lots and lots of extra code and logic,
boring is good.

If the above is making sense.

How many different sets of subforms and subreports do you anticipate?

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeupdater.com/

Tony Toews
15.12.2010 - 04:27
On Tue, 14 Dec 2010 18:42:31 -0700, Tony Toews
<ttoews@telusplanet.net> wrote:

What I was thinking of doing was to have a user defined list of
appropriate fields and their data types, per category, so that
appropriate information can be stored for each without having to build
separate data tables per category.

I'd sure be tempted to have lots of empty fields in the Item table.
With a field on the category table stating what type of fields should
be displayed. Then create subforms and subreports on the type of
fields. Then make visible the appropriate subforms and subreports
depending on the category type.

Actually subforms and subreports might not be the right solution.
Groups of controls with tags which are visible/nto and moved might be
the best answer.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeupdater.com/

The Frog
15.12.2010 - 09:26
Hi Tony,

I am expecting between 7 and 10 categories in total. I was thinking of
a field list of available 'features' with about 25 different fields.
Most of the products being dealt with are food based so the required
'features' are almost the same with only a few exceptions.

If I was to build the generic model based on a meta-data driven
approach I am not sure that the return for the effort is worth it in
this case. On the other hand having a meta-data driven set of forms
for the different user interfaces might be a great idea. I could add
in a few 'generic' fields that could be re-purposed by users to hold
simple data like strings or numbers, and a table to keep the necessary
descriptive info similar to imb's suggestions above, but just for the
'name' the user wants to see the column as, as well as the proposed
data type. To get the data back out I would have to have the user form
that allows the customisations also generate the required SQL for a
query to act like a view that returns all data for a category (or
maybe just save the SQL in a table in the BE - depends on what the
external applications want).

After doing some basic data cleaning to get a better grasp of the
situation I have managed to determine that from the several million
rows of incoming data there are only about 40k rows of 'actual'
original data. There are many repeats in the incoming lists, and this
will have to be handled by the ETL application -> the end result is
that the app I need to build is going to do the final 'read in' of new
product data, allow users to 'approve / discard' or alter the incoming
data, and provide a comprehensive 'view' of that data available in the
BE. Access will be the BE for this as the data needs to remain
portable and easily interfaced with other Office applications. If all
of this works to plan then the various 'macro's' and functions that we
(and by we I mean me) build should reduce many hundreds of man hours
worth of work down to only a few minutes. Makes me glad I developed an
Access report to PowerPoint slide function - the test report for this
reduced a week and a half / two weeks worth of manual work down to
about 1 minute. The next is far larger in scope and wholly dependant
on this app (and the ETL that feeds it) to do its job properly and
efficiently. I think this is one of the best office projects I have
worked on in years.

Cheers

The Frog




Share/Bookmark

next