![]() |
||
|
A Description of the Pacific Fisheries Information Network
6.4 The PacFIN System The PacFIN system receives all of its data from four state fishery agencies, three NMFS offices, the U.S. Coast Guard (USCG), and the Department of Fisheries and Oceans, Canada (DFO). The chart "PacFIN Data Flow 21JUN96" describes the complete PacFIN flow data from data sources to central database to data users. There are ten different transaction types and two kinds of data files used by these nine data sources to supply all of the data to the central database. The following table summarizes this data source vs. input transaction or data file relation.
All data destined for the PacFIN central database arrives on the Orca using one of five methods: internet FTP directly to the Orca system, initiated at either the sending or receiving end; a 9600 BPS (or faster) Kermit file transfer from a data source bulletin board to a PacFIN Office computer system and then an FTP to Orca; a diskette containing an MS-DOS ASCII file(s) sent via the U.S. mail, or hand carried, with the file(s) subsequently moved to Orca using FTP; an 8mm Unix tape containing an ASCII file with the transfer performed by the Orca operations staff; or 9-track tape, which has not been used for sometime. The frequency of update varies depending on the data source. For the PFMC state agency data sources the update frequency is monthly while the AFSC, which is also a PFMC data source, the update frequency is weekly for the very short at-sea processor pacific whiting fishery. The ADFG and AKR data sources are weekly data providers. The DFO is scheduled to provide a preliminary data feed each May for the previous calendar year with a final update due in November. The NWR's LEP history file is provided monthly and USCG's vessel data file is obtained annually. These are the agreed upon update frequencies. The update frequencies mentioned above are the nominal update frequencies and for some agencies the actual update frequencies diverge considerably from the expected update frequencies. The three W-O-C data sources are very consistent with regard to providing their monthly data feeds each month on, or about, the 14th of each month. And the NMFS/AKR continues to contribute their data on a regular weekly or biweekly basis. In opposition to this excellent pattern of complying with the agreed upon update frequency, the ADFG and DFO data sources represent the other extreme. The final 1993 DFO data feed was delivered in March of 1995 (due in NOV94), the first DFO 94 data feed was provided in May of 1995 on schedule, but the final 1994 had not been received by the end of JUN96, likewise with the preliminary 1995 data feed due in May of 1996. Final data feeds for 1994 and 1995 were received by ADFG in March of 1996, but no data for 1996 had been received from ADFG as of the end of JUN96. Data completeness of the PFMC PacFIN data varies from ODFW, which is normally 90-95 % complete 15 days after the end of each month to CDFG which is usually about 90% complete three and one-half months (105 days) after the end of each month. These completion estimate statistics are based on the total groundfish catch added each month and do not take into account any changes in the ACM and/or SCM data or possibly the total absence of ACM and/or SCM data. All of the data supplied by all PacFIN data sources is validated, to some degree, and then stored in database tables. The diagram "PacFIN Database Structure Chart 24JUN96" lists all of these PacFIN central database tables and describes the basic relationships between and among these tables. The table titled " PacFIN Report: Column Descriptions" contains a description of each column in the PacFIN database. The reader is referred to this table for a detailed description of each column. The following paragraphs will attempt to describe each table in a more general sense. 6.4.2.1 Tables: sp, ar, gr, pc, ag, cl The PacFIN code list tables: species (sp); areas (ar); gears (gr); ports-jvs-cntry (pc), agencies (ag), and code-list (cl) are all tables that are populated with data originating from within the PacFIN Office. The sp table contains one entry for each PacFIN species code or identifier (spid). The other columns, or attributes, in the sp table are used to group, order, or further describe each entry in this table. This table is used extensively by the validation, update, and retrieval portions of the system. As an example, when each ACL and AGC transaction is added to the system this table is inspected to ensure that each specified spid exists in the sp table. During the update process the summarized column determines whether summarized data will reside in the summary-catch (sc) table for that spid. Also during the update process the spid, complex, and mgrp columns determine the spid(s) that are maintained in the sc table. To one extent or another all of the columns in the sp table are used during one or more of the retrieval processes. However, all of the "...order" columns and all of the "...flag" columns are used exclusively for retrieval purposes. The "...order" columns are used as aids in ordering rows in specific reports and the "...flag" columns are a means for selecting particular spids for inclusion in certain reports. A complete list of all PacFIN species codes is produced on the Orca system using the "list_sp" sqlplus retrieval routine. The ar table is structured very much like the sp table in that the columns are used for validation, update, retrieval, and general description purposes. There is however on very significant difference. Each entry in the ar table only contains the larger group (argroup) to which the arid belongs, whereas each sp row contains all of the groups or complexes that the particular spid participates in. Example: sp row where spid = DBRK contains complex = ROCK, mgrp = GRND, and complex2 = SBTS, while ar row where arid = 2C contains argroup = CL and ar row where arid = CL contains argroup = PC and ar row where arid = PC contains argroup = ALL. Another very important column in the ar table is type, which is used to group area ids within the same area system together, or to specify subareas within a system. Examples: ar row where arid = AS contains type = 5, while ar row with arid = 2A contains type = 1. Type = 1 specifies that the area is a PSMFC groundfish area, while type = 5 specifies that the area is a salmon area. The gr table is similar to the ar table in that each row contains a grgroup column like the argroup in table ar. The pc table is structured like the ar table as well. The gr and pc tables are also similar to the sp and ar tables in that they are used in the validation, update, and retrieval portions of the system. The ag table is a very short table containing one entry for each data source, or potential data source. The 90%-completion-estimate date (i.e. month) is maintained within this file. The cl table is a table that contains 20 code-lists. Each code list corresponds (more or less) to a specific column in the database. All of the possible codes for columns such as condition, disposition, and grade are contained in this table. What characterizes these code lists and why they are all maintained in the same table is that their primary use is for validation during update and they have no grouping attributes like the sp, ar, gr, and pc tables. In addition a description of each code is contained in this table, providing some of the most important documentation of the data contained in this database. Nearly all of the data that resides in PacFIN tables originated with one of the data sources or has been added for data management purposes. This category of data management data includes data elements: ulid and modified. An inspection of the description of all tables that are populated from input transactions will show that ulid and modified are contained in each of these tables. The agency-code-list (ACL) tables consist of five tables that contain all of the relations that exist between state agency codes and PacFIN codes. The data source codes that do exist in these tables are those codes that reside, or potentially may reside, in the ft and/or ftl tables. At this time only the three W-O-C data sources supply PacFIN with ft/ftl data and therefore the ACL tables contain only W-O-C species, area, gear, port, and processor codes. All five tables are structured similarly. So a description of the asp table should serve to describe all five tables. 6.4.2.2 Tables: asp, aar, agr, apr, apc Each row of the asp table contains an agency ticket-category and a PacFIN species code (i.e. spid). In addition each row includes a description of the code and the agency-id, which is used to qualify the ticket-category. A note about the naming of columns and the meaning of species, species-code, market-category, ticket-category, and category is appropriate at this point. Although each W-O-C data source uses the terminology "species code" to refer to the identifier that identifies the biological mass that is the main subject of each ftl row, "species code" is really a misnomer. In only some of the cases does the "species code" contained in the ftl row actually refer to a specific, scientifically identifiable species (ex: microstomus pacificus). Most state agency supplied "species codes" refer to some collection of harvested marine animals. The primary driving force determining the manner in which these assemblages of marine animals are harvested is the market and so these collections of species have come to be known as market categories. But, that handle even misses the mark, because a market category is also determined by the condition of the fish at landing, the disposition (i.e. market), the grade or size, and possibly other attributes. And so the column containing this state "species code" in the asp table was given the name of category during the re-definition process, meaning simply a category of fish/animals. As mentioned, the aar, agr, apr, and apc tables are structured essentially the same as the asp table. And thank goodness we don't have the same problem with the meaning of columns area, gear, port, and processor that we have with category! There is one last anomaly and it has to do with the apc table. The apc table is the only agency-code-list table that does not contain the relationship between a data source code and a PacFIN code. A set, or domain, of PacFIN processor codes has never been established. The set of processor codes, with each code's qualifying agency-identifier is in essence the set of unique coastwide W-O-C processor codes. The ft and ftl tables are easily the largest tables that contain data supplied by data providers. The total number of rows, in thousands of rows, contained in these 32 tables is included below. Number of Rows in the ft and ftl Tables as of 26JUN96 Units = Thousands of rows
Although both the ft and ftl tables are logically one table each, the design that was considered optimum for this implementation called for a set of annually partitioned tables. So the logical ft and ftl tables consist of 16 physical tables each. The ft table contains one row for each landing receipt recorded by the fishery agencies of the states of California, Oregon, and Washington. Each ft row contains data that occurs at most once on each physical, hardcopy fish-ticket. Another way of characterizing the data contained in each ft row is that all of the data elements are trip specific, i.e. each data element describes some attribute of a fishing trip. On the other hand, each row in the ftl table contains one entry for each market-category line that was recorded as part of each fish-ticket document. All of the data elements included in this table are essentially attributes of the ticket-category that was harvested. The estimated and worst_est columns in the ftl table are unique in that they are not data input by the data sources and they are not data management attributes. These two elements can be classified as "derived" attributes since the values for these columns are derived as a result of the update process for ftl transactions. The drvid column (meaning derived vessel id) in the ft table is developed in a similar fashion using the contents of columns veid and vesseltype in conjunction with data in the sv table. The sv table nominally contains one row for each vessel registered to harvest fish commercially. There is one row for each vessel registered by each W-O-C agency for each year. If a vessel was registered by one state fishery agency each year then there would be 16 entries for that vessel, one for each year. If a vessel was licensed to fish commercially in all three W-O-C jurisdictions each year then there would be 48 entries in the sv table for this vessel-id. It is known that for some of the earlier years the set of CDFG "registered vessels" is not necessarily the set of registered vessels for that year, but rather the set of registered vessels plus vessels registered in the subsequent calendar year but not in the year specified. This table is used primarily during the update process, but is also used when vessel attributes, such as length and name, are requested. The following table contains the number of rows in the sv table by year and agency.
6.4.2.4 Tables: acm, scm, ecm, aw The acm, scm, ecm, and aw tables collectively are referred to as proportion tables or simply proportions. Even though the aw table contains average weights, which are rates rather than proportions, the aw table is included in this group of proportion tables since it functions in much the same manner. The data contained in these tables, along with the ft and ftl data, is the part of the system that distinguishes the "re-defined" PacFIN from the earlier PacFIN system. All four of these tables include columns coeffvar and samples. The coeffvar is the coefficient of variance and was included in the specification with the intention of documenting the quality of the estimated proportion. It was soon learned that not all data sources could provide the coeffvar so the column "samples" was added. Samples is simply the number of samples, or observations, that were used to develop the estimated proportion. All proportion tables include year, month, day, and period. These four items are referred to collectively as "time-period" in the subsequent paragraphs. The catch-by-area proportion (acm) table contains proportions that are used to distribute catch for a particular time-period, comptype, pcid, grid, spid, grade, and triptype to fishing areas specified in column arid. Of these seven attributes that describe a "strata" only the grade, and triptype may be empty or null. Grade is typically used only for sablefish and salmon species and triptype is used exclusively for salmon. Acm transactions are used by both WDFW and ODFW to distribute catch of groundfish species categories and they are used by ODFW to distribute catch of salmon species in a similar manner. The acm transaction was originally proposed in the context of distributing groundfish catch to PSMFC areas where individual vessel logbooks contain the basic or lowest-level data used to develop these proportions. The inclusion of acm proportions for salmon species was a result of the specification review process that uncovered the need to include these proportions in order to produce the "best available data" given individual fish tickets as the lowest level data. Below is a table that displays the number of acm rows by year and data source.
The species-composition (scm) table contains proportions that are used to distribute catch for a particular time-period, pcid, arid, grid, and unspid (unspecified species assemblage) to the species specified in column spid. Of these five attributes that describe a "strata" only the pcid or the arid may be empty, but not both. Scm transactions are used by all three W-O-C data sources to distribute rockfish assemblages to individual species. These proportions are developed from rockfish samples taken at selected west coast ports and/or from trawl logbooks. Below is a table that displays the number of scm rows by year and data source.
The effort-by-area proportion (ecm) table contains proportions that are used to distribute effort for a particular time-period, comptype, pcid, grid, mgrp, and triptype to fishing areas specified in column arid. This table parallels the acm table except that spid is replaced by mgrp, the subject of the table is effort instead of catch, and there are two possible measures of effort that are being apportioned: deliveries and days-fished. This table is used exclusively by ODFW for both the groundfish and salmon management groups.
The average-weight (aw) table contains average-weights that are used to compute an estimated number-of-fish for a particular time-period, pcid, arid, grid, spid, grade, and condition. This table is used exclusively by ODFW for salmon species and Columbia River sturgeon and shad.
6.4.2.5 Tables: sc, dc, se, de The summary-catch (sc) and detail-catch (dc) tables date back to the original implementation in 1981. The logical sc table is an annually partitioned table like the ft and ftl tables while the dc table is physically a single table. As stated previously in this document (6.2.1) the primary purpose for the sc table at that time was to aid in the speedy retrieval of summarized catch statistics. And that is still the sole purpose of the sc table. In fact every statistic contained in the sc table can be derived from some other data residing in the PacFIN database. Most (maybe all?) of the relationships between and among time-period, spid, arid, grid, and pcid are contained in this table. Summarized data for all PacFIN data sources is included in this table. The dc table has a direct relationship with the sc table. In the early years all input data was supplied using a single transaction type: the aggregated-catch (AGC) transaction type. The data contained in these AGC input records was stored in the dc table. And to this day the ADFG, AKR, AFSC, and DFO data sources continue to use this transaction type to submit their data. Now with the W-O-C data sources submitting ft, ftl, acm, and scm transactions in lieu of aggregated catch transactions internally generated AGC transactions are produced for W-O-C data sources and stored in the dc table. At all times the dc and sc tables are consistent with each other, meaning that as each dc row is added, changed, or deleted the necessary sc rows are added, changed, or deleted. For 1981 through 1986 the dc table still continues to contain the original daily AGC records for the W-O-C data sources and for rockfish species this is the best available data. For 1987 and later the dc table contains internally generated monthly aggregates for the W-O-C spid codes that are maintained in the sc table. Not all spid codes found in the PacFIN sp table are found in the sc table. The summarize column in the sp table determines which spid codes are maintained in the sc table. Summaries for all groundfish, salmon, and many other species are maintained in this table. The following tables contain the sc records by year and the dc record counts by year and agency.
The summary-effort (se) and detail-effort (de) tables have been part of the system since about the second year. These tables potentially contain three measures: deliveries, trawl-hours, and days-fished. Deliveries is simply a count of fish-tickets (i.e. a count of ft rows) based on a certain criteria. Trawl hours is an estimated value for the number of hours a fishing vessel is actually engaged in the act of fishing with its net in the water. Days-fished is computed or derived from the ft table. Days-fished has been reported to the central system primarily for the salmon fishery. Trawl-hours is exclusive to the groundfish trawl fishery. The deliveries statistic is included in this table as it is an attempt to provide some coastwide W-O-C comprehensive consistent measure. But it is acknowledged that number-of-deliveries is not a measure of effort, but rather an "index to effort" or possibly just some measure that only has meaning in the context of a particular user's application. The se and de tables are structured in the same manner as the sc and dc tables. The biggest difference is that the se and de tables have data element mgrp (management group) instead of spid as found in the sc and dc tables. As a result the se and de contain only about 10% of the number of rows that reside in the sc and dc tables. The set of mgrp codes is a subset of all spid codes. The fact that mgrp resides in the se and de tables while spid is a column in the sc and de tables is often overlooked by users of the PacFIN system. The PacFIN system at this time contains statistics for deliveries, trawl-hours, and days-fished for certain management groups but NOT for individual species. Example: while the sc table contains catch statistics for sablefish there are no deliveries nor trawl-hours statistics for sablefish, but there are deliveries and trawl-hours for the groundfish management group. Another example: while there are catch statistics for chinook salmon there are no days-fished statistics for chinook, but there are days-fished for the salmon management group. As part of the transition from the original system to the "re-defined" system the responsibility for computing number of deliveries and days-fished shifted from the W-O-C agencies to the PacFIN Office. So for 1987 through 1994 the number-of-deliveries and days-fished statistics that do reside in the de table were computed by the central processing system and the 1981 through 1986 data for these two statistics is retained from each W-O-C's original input. Deliveries and days-fished statistics for 1995 and 1996 do not reside in the de nor the se tables at this time since the software to generate these statistics has not yet been developed. 6.4.2.6 Tables: cg, nv, ul, dl The U.S. Coast Guard (USCG) vessel data (cg) table contains selected attributes from the USCG's Merchant Vessels of the U.S. data file. At this time all entries, or rows in this data file are entered into the cg table. The attributes selected include columns such as gross weight, length, horsepower, and the year the vessel was built. This table is available to those users who need to join ft, sv, or lep_src rows with cg rows in order to "pick-up" additional attributes. This table is not however an integral part of the PacFIN central processing in that there are no specific standard reporting applications that require this data. The USCG will not provide this data file directly to the PacFIN Office, so the NMFS/SWR acts as an intermediary requesting the data file and forwarding it to the PacFIN Office. There are currently two "editions" of this data file contained in table cg distinguished by column "pubyr". There is on data set that was "published" in 1991 and another that was published in 1995. Column "latest" allows a user to select the latest entry for any one particular vessel-id. The non-vessel (nv) table is an ancillary table which is a by-product of translating state fishery agency vessel plate numbers to either a USCG vessel-id or a state marine board id using the sv table. Other "non-vessel" identifiers that occur in the veid column of the ft table and plate numbers that are not found in the sv table are given a special "vessel-id" so that each ft row will have a unique and correct entry in column drvid. Many of these entries in the nv table are a result of tribal identifiers that populate the veid column of table ft. The update_log (ul) and detail-log (dl) tables are PacFIN data management tables. These two tables hold data about each data feed processed by the system. Included is a unique identifier for each update to the system, which allows one to determine when any particular datum entered the system, or when any particular row that corresponds to an input transaction was most recently modified. In addition statistics about the amount of data that enters the system during each update process is saved in the dl table. This information is one of the data sets used to determine the data completeness for each PacFIN data source. < ... QSM Tables ... LEP Tables ... > 6.4.3 Central Processing - Update The update portion of the central processing part of the PacFIN has recently been implemented in a Unix/Oracle software environment. All of the software routines that comprise the suite of "update" software were developed using one or more of these programming, or software development, languages: Oracle's PL/SQL; Oracle's SQL*Plus; the 'C' programming language; and Oracle's SQL*Loader. In addition the Unix shell programming language was used to integrate all update modules into a single "production update job". Not all update modules are needed for all data sources. All of these update routines validate the input data to some degree. In some cases where the data value is found to be in error the input transaction is rejected, but in other cases a warning might be generated and the record would be accepted. Transactions that are rejected or warnings that are generated are reviewed by each data source PacFIN coordinator and the appropriate action is taken by that coordinator. Although the central processing system does include attempts to validate input data, it should be noted that these validation exercises are merely offered as a service to the agencies that provide PacFIN input data. It should also be noted that the content of each data file (i.e. the value of each data element) is solely the responsibility of the data source. So although certain data are not allowed entry into the database, other invalid or incorrect data may gain entry to the database since the suite of central processing validation routines is not absolutely comprehensive. More extensive central processing validation routines can be incorporated and are incorporated as the "need" arises. The following table lists each module and includes the data sources, whose data are processed by the module and the software used to develop the routine.
The update process gets started when the SQL*Loader utility is used to load all data from a particular input data file into one or more Oracle tables. The SQL*Loader control file serves as a type of software module in that it specifies to the SQL*Loader program how to load each datum from the input ASCII Unix file to the temporary Oracle tables. These tables are designated as temporary since the data loaded into them are only retained for the duration of the update process. One of the interesting features of the PL/SQL programming language is that it can only operate on data in Oracle tables. It can read from and write to an Oracle table, but it cannot read/write directly from/to a Unix data file. All of the temporary data table names are prefixed with "df_". Prior to the beginning of each update process, all "df_" tables are purged. For the most part the value of each datum is moved "as is" to the temporary table, but in a few cases certain "translations" are performed. As an example, the item "day" occurs in may tables and in some tables it is a required item, while in others it is not required and my be input by the data source as "null". For those tables where day is allowed to contain the null value it is translated to zero. This is apparently required by Oracle since the day column is used as part of a larger key field. If the reader is interested in the details of this SQL*Loader control file then she should contact the PacFIN Office. The first module that is executed for all data sources is the ul_update module. This module reads the single row contained in the df_report table creating the permanent entry in table ul, or in the event of a continuation data feed verifies that the report record exists in table ul. In addition certain data specific to the update that is in progress is saved in the df_report row. In the following paragraphs whenever the phrase "inserts into <table name>" is used it implies updating and deleting rows as well. The acl_update inserts agency codes into the appropriate agency-code-list table (asp, aar, agr, apr, and apc). A reject could occur if the transaction contained an spid not found in the sp table where type = 1. And the same would be true for arid, grid, and pcid. The acm_update module inserts acm transactions into the acm table. There are a number of ways a reject could occur, but the most common would be an invalid PacFIN spid, arid, grid, or spid. The scm_update module inserts scm transactions into the scm table. A common reason for a rejected transaction would be an invalid PacFIN spid, unspid, arid, grid, or pcid. The age_update module inserts aggregated-effort (nominally trawl-hours) into the de table. An invalid mgrp, arid, grid, or pcid would cause a reject. The aw_update and the ecm_update modules insert average-weight and ecm transactions into the aw and ecm tables. And once again all columns are validated to the extent possible and rejects and warnings are produced as appropriate. The sv_update module inserts state-vessel transactions into the sv table. In this case a warning would be caused if the value contained in items length, weight, or horsepower failed the range check that is performed on each of these items. The ft_gen_trans and ftl_gen_trans modules are only used to process WDFW ft and ftl input records. WDFW no longer attempts to generate add, change, and delete transactions for these two transaction types. Instead they submit all ft and ftl records using the "&" operator, which means that the record may be an add, it may be a change, or it may already be in the database (i.e. an ignore record). This kind of input data has come to be known as "& all", since all records for a single calendar year must be provided in order to determine delete transactions. So these two modules determine the necessary add, change, and delete transactions as well as the input records that can be ignored. The ft_update module inserts ft transactions into the appropriate ft table (ex: ft95). There are a number of situations that could cause a reject to occur and one of these involves port-of-landing. The unique key for each ft record is: agid-ftid-year-month-day-pargrp. If there is an attempt to enter the same exact ft row (i.e. same unique key) with a different port code the record will be rejected. This kind of reject has been a fairly common occurrence. In addition this module is different than any of the other "_update" modules in that it derives a value for column "drvid" (derived vessel identifier). This task of deriving a vessel-id involves searching the sv table and/or the nv table and possibly inserting a new entry into the nv table. This exercise is necessary since both CDFG and WDFW use a plate number, instead of a USCG or State Marine Board #, to identify the vessel that was used to harvest the catch, if a vessel was used. The veid column contains this plate number, if vesseltype = 3, or possibly other "things" - see vid-type in attached table "General Code Lists". If vesseltype = 3 then veid is used to search the sv table to find the actual coastwide vesselid contained in column svid. If the plate number is found then the corresponding svid is retrieved and stored in drvid. If the plate number is not found then a search of the nv table is performed attempting to match the year, agid, vesseltype, and veid of the ft transaction to the year, agid, idtype, veid in the nv table. If a match is found then the vessel-id contained in artvid (artificial vessel identifier) is retrieved and stored in drvid. If a match is not found then a new entry is inserted into table nv with a unique artvid for the year, agid, vesseltype, and veid. This new artvid is then stored in column drvid. The whole purpose of this exercise is to ensure that ft rows with unknown vessel identifies or non-vessels in the veid column are translated to an identifier that will not assign catch to a vessel incorrectly. The ftl_update module processes ftl transactions and inserts rows into the appropriate ftl annual partition. There are many data items within this transaction and all elements are validated to the extent possible. Rejects can occur if the category, grade, condition, disposition, area, gear, or par-group are determined to be invalid, but rejects can also occur for many other reasons as well. One feature that separates this routine from other "_update" routines is that if the ftl transaction contains a price-per-pound that is null the transaction is saved in a separate table for later processing by the ftl_bld_actual_prices and ftl_estimate_prices routines. This temporary table is called the null_dollar table. Null_dollar ftl transactions are processed last to ensure that the best actual prices reside in the ftl table when estimates are developed. The ftl_bld_actual_prices routine builds a temporary table of actual prices. This routine inspects the null_dollar table in order to develop a list of ticket categories so that this process does not compute actual prices unnecessarily. The actual prices table contains the following dimensions: category, condition, disposition, grade, month, pcid, and grid. For each cell within this seven-dimension array the total pounds and revenue are computed and stored. This table is built using only ftl rows that contain actual prices and it is built for all months in the current and previous years. The ftl_estimate_prices module uses the actual prices contained in the temporary table in conjunction with a particular search algorithm to determine the best estimated price for each ftl row contained in the null_dollar table. Once the estimated price has been found the ftl transaction is stored in the ftl table with data items ppp set equal to the estimated price and "estimated" set equal to true. There is a possibility that the search algorithm will not find an actual price. In these few cases a default worst-estimated price found in the sp table is used and the ftl attribute "worst-est" is set to true. In recent years nearly all groundfish and salmon catch has been reported by the W-O-C data sources with actual prices. However, this method for estimating ex-vessel prices continues for species other than groundfish and salmon species. As an example, the June 14th, 1996 data feed provided by CDFG contained 33,697 ftl transactions and 1,322 of these required that an estimated price be determined. The ftl_summarize process is the part of the system that corresponds to the "re-definition" more than any other part. The purpose of the ftl_summarize process is to summarize ftl data into monthly aggregates applying acm proportions and the scm proportions producing acm transactions that will subsequently be processed by the agc_update and agc_summarize modules that maintain the dc and sc tables. The ftl summarize process can be logically divided into four parts: 1. create monthly ftl aggregates; 2. apply catch-by-area proportions; 3. apply species-composition proportions; and 4. generate aggregated-catch transactions. The computation of monthly aggregates from ftl data gets started by determining which months need to be summarized. This is determined by finding each month that occurs at least once in the set of ftl, acm, and scm transactions that have just been processed. These monthly aggregates are summarized by month, spid, arid, grid, and pcid and for each cell in this five-dimensional array the round-weight equivalent pounds (rwt-lbs), number-of-landings, number-of-fish, pounds that where actually priced (lbs-priced) and estimated revenue are computed and stored. It should be noted that all ftl rows, for the selected months, participate in this aggregation exercise, not just those that will be subsequently apportioned. These monthly aggregates are then apportioned to area of catch for only those cells that have corresponding proportions in the acm table. Many of these ftl produced aggregates will not need apportionment by acm proportions. Here is where comp-type comes into play. Those aggregates that do have corresponding acm proportions are further summed by month, spid, grid, and pcid and by either pacific ocean or Puget Sound. Acm proportions are then applied based on comp-type. Each month-spid-grid-pcid aggregate that is apportioned will produce one or more month-spid-grid-pcid-arid aggregates. As a result of the data contained in the acm table this process currently only apportions into PSMFC areas. At present there are only three groundfish comp-types specified - two for WDFW that separates ocean areas from Puget Sound and one for ODFW that essentially specifies that only those ftl aggregates with arid = unknown will be apportioned. These newly generated aggregates replace only the original aggregates that were apportioned so that there is no double counting and the total pounds remain the same. The set of monthly aggregates that were apportioned to area and those that were not apportioned are all passed through the scm apportionment process. There are two special cases that need to be handled. Scm rows input as quarterly are expanded to the corresponding monthly scm rows and appended to the existing monthly rows. And CDFG's sample port group that populates the scm pcid for CDFG's scm rows is translated so that scm rows will match the correct pcid contained in the monthly aggregates. Column sgroup in table pc affects this translation. Once again there are many aggregates in the set that will not have any corresponding rows in the scm table. Example: there will be monthly aggregates for dover sole but there will not be any scm rows for dover sole, while there will be scm rows for unspecified rockfish and hopefully there will be corresponding rows for unspecified rockfish in the set of monthly aggregates. The scm proportions are applied by matching month, unspid, grid, arid, and/or pcid in the scm table to corresponding columns in the set of original ftl aggregates and acm apportioned aggregates and the necessary statistics are apportioned by multiplying each by the proportion in the "matched" scm row. The rwt-lbs, number-of-fish, lbs-priced, and estimated revenue are all apportioned, while the number-of-landings is set to null for all aggregates resulting from either an acm or scm apportionment. The last step is to generate only the necessary agc transactions. Since this ftl_summarize process is accomplished on a monthly basis there are potentially monthly aggregates resulting from the above three steps that may not be needed any longer. This is determined by comparing the monthly aggregates produced so far to the monthly aggregates residing in the dc table. From this process only the necessary add, change, and delete agc transactions are determined and then stored in table df_agc for subsequent processing by the agc_update and agc_summarize routines. The last segment of the update process has to do with processing agc transactions. Agc transactions can be one of two types: those generated by a data source or internally generated agc transactions like those generated by the ftl_summarize process. In either case agc transactions are loaded into the df_agc table and agc processing begins with that data. The agc_gen_deletes routine is only used to process ADFG, AKR, AFSC, and DFO data feeds. All four of these agencies do not provide delete agc transactions, instead they provide all agc "&" transactions for one complete calendar year. The "&" indicates that the input transaction is either an add or a change. Given this complete set of "& all" agc transactions the necessary delete transactions can be determined by comparing all rows in the input data feed to rows in the dc table for the particular year and the particular agency. This routine does just that and inserts any generated agc delete transactions into table df_agc with the "&" transactions that already reside there. The agc_update module is used to process both varieties of agc transactions: data source generated or internally generated. This routine basically inserts rows into the dc table after certain validations are performed. In addition as each add, change, or delete transaction is inserted into, deleted from, or used to update a dc row a copy of the agc transaction is saved in a special table called df_agc_to_sum for later processing by agc_summarize. If the agc transaction is a change transaction then the difference, or delta, for each statistic is computed and stored in the df_agc_to_sum row. So the values in the df_agc_to_sum row can be either positive or negative. The agc transaction includes revenue of landed catch. If the revenue, or estimated value, is null then the agc transaction is not inserted into the dc table, but is instead saved in an agc null-dollar table for later processing by agc_bld_actual_prices and agc_estimate_prices. The DFO data source does not provide any economic data, so in this case agc transactions with null revenue are left as is with no attempt to estimate the revenue. The AKR and AFSC data sources also do not provide any economic information as provided for within the agc transaction, but all AKR and AFSC transactions receive a value for estimated revenue. So all agc transactions for AKR and AFSC are inserted into the agc null-dollar table during this process. The agc_bld_actual_prices program creates a temporary table containing actual prices based on data in the dc table. A list of spid codes is developed by inspecting the agc null-dollar table so that actual prices are computed and stored for only the necessary spid codes. This table contains five dimensions: spid, month, pcid, grid, and arid. The actual pounds that were priced and the actual revenue of those pounds are computed for each cell in this five-dimensional array and stored in the appropriate row in the table. This table includes actual prices for a minimum of 13 months and a maximum of 36 months depending on the calendar year being processed and the months that reside in table dc for that calendar year. The agc_estimate_prices routine uses a particular search algorithm to find an estimated price in the agc actual prices temporary table for each row in the agc null-dollar table. The estimated price along with the total pounds are used to compute an estimated revenue which is stored in item "estval". The agc row is then inserted into the dc table and a copy is added to the df_agc_to_sum table. For ADFG null-dollar transactions, the search algorithm uses only actual prices derived from ADFG dc rows. For AKR price estimating, only a subset of the ADFG actual prices are used (deliveries such as bait deliveries are excluded). And for AFSC price estimating the agc actual prices table is not used at all and the sc table is used instead, where the average round-weight equivalent estimated price resides in each row. The AFSC search algorithm only looks for average prices where pcid = 'ALP' (all domestic). The agc_summarize module is the final step in the update process. This module maintains the summary-catch (sc) table by inserting, updating, or deleting the necessary rows. The rows contained in table df_agc_to_sum are the rows that are summarized so modules agc_update, agc_bld_actual_prices, agc_estimate_prices, and agc_summarize must be executed consecutively - they essentially form a single unit. Of course, for W-O-C processing only agc_update and agc_summarize need be executed since all W-O-C estimated prices are now developed by the ftl_estimate_prices routine and therefore all internally generated agc transactions contain an estimated, or actual, value for revenue. This summarization process gets started when a row is read from the df_agc_to_sum table. Five vectors are developed, one each for: period, spid, arid, grid, pcid. The values for each of these vectors are determined by the month, spid, arid, grid, and pcid contained in the dc row being summarized. As an example, if the dc row contained: month = 5, spid = DOVR, arid = 2C, grid = LGL, and pcid = AST, then the vectors would be: prdvec = M5, Y1; spvec = DOVR, FLAT, GRND; arvec = 2C, CL, PC, ALL; grvec = LGL, HKL, ALL, and pcvec = AST, CLO, AOR, ALP, ALL. Period = Y1 designates the annual period. For all possible combinations (for this example 360) rwt-lbs, number-of-landings, number-of-fish, estimated revenue, and lbs-priced are either inserted as new rows, deleted from the table, or used to update existing rows. Of course the length of some of the vectors will vary depending on the particular codes being processed from the agc row. The period and grid vectors always contain 2 and 3 elements, respectively, while the spid vector can have 3 to 5 elements, the arid vector 3 to 7 elements, and the pcid vector either 4 or 5. A simplified version of this summarize process using "Structured English" or a pseudo-programming language might look like this:
The above simplified version of the summarization process or summary-catch maintenance, must be modified considerably before it becomes useful. Mainly there are a number of exceptions to the "general rule" specified in the above pseudo-code. As an example, for many years now summary-catch rows for individual rockfish for WDFW ports and port groups have not been maintained in the sc tables. Another exception is that no summary-catch rows are maintained where arid is a PSMFC area and pcid is either 'ALP' or 'ALL'.
6.4.4 Central Processing - Retrieval There are at least two methods for retrieving data from the central database: SQL*Plus routines; and the *_rpt suite of programs. The SQL*Plus script file routines may be developed by the PacFIN Office or they may be developed by PacFIN users who have access to the Orca system, while the *_rpt suite of programs are developed by the PacFIN Office and can be executed by PacFIN users or PacFIN staff. The SQL*Plus script files replace the extract and ext/pacfin programs that were part of the retrieval system during the Unisys B7900 days. Hundreds of SQL*Plus script files have been developed by the PacFIN Office. These routines can be grouped into certain "classes" based on the retrieval functions that they perform. A partial list of these script file groups, or classes, includes: list_*, rpt_*, ann_*, sum_*, and pcs_*. This notation is Unix shorthand for the group of files that begins with "list_", "rpt_", etc. The group of list_* routines all produce various lists from the sp, ar, gr, pc, and other "list" type tables. The rpt_* and ann_* class of routines retrieve selected rows from the summary-catch table (one or more sc tables). The ann_* files retrieve only annual values, while the rpt_* class of routines allow for more general selections that include individual months. The sum_* and pcs_* routines operate against the ftl, ft, and possibly other tables, but specifically do not select any data from the summary-catch tables. Both of these groups compute landed-weight pounds, round-weight pounds, revenue, ex-vessel price-per-pound, percentage of pounds priced, number-of-landings by PacFIN spid, and possibly other statistics. The main difference between these two groups of routines is that the pcs_* routines select only data for the PFMC (pcs = PFMC sums), while the sum_* group produces sums without any area-of-catch restriction. The main reason these two classes of summation/retrieval routines exist is that not all data residing in the ftl and ft tables are summarized and saved in the sc tables. The reader is referred to a document, available upon request from the PacFIN Office, titled "Using Unix & Oracle to Access PacFIN Data", which gives an introduction to these script files along with other new user orientation information. All of these script files have been developed primarily to be used "as is" to retrieve selected data, but secondarily these script files serve as "how to" models or templates for users who need to develop their own custom retrievals. All Orca users are encouraged to make use of these central processing SQL*Plus script files to the extent they can be useful, including copying any of these files to use as "starting points". The other retrieval mechanism consists of the area_rpt, source_rpt, gear_rpt, port_rpt, species_rpt, and activity_rpt programs. These programs are exact replacements for the RPT/= suite of programs of similar names that were part of the system when it was operational in the Unisys B7900 DMS-II Algol environment. The products from these programs have become known as the "Standard PacFIN Reports". The reader is directed to the PSMFC Homepage (http://www.psmfc.org) or if the reader is an Orca user she is directed to the ~pacfin/rpts/<year>/pfmc Unix subdirectory, where <year> can be 1996, 1995, etc. for examples of these kinds of PacFIN Standard Reports. As of this writing, this *_rpt retrieval subsystem is still in development. When development has been completed Orca users will be able to generate their own standard reports, which number in the thousands. Until then selected standard reports will be produced by the PacFIN Office and made available as described above. 6.4.5 Quota Species Monitoring (QSM) Subsystem The QSM subsystem is a part of the PacFIN system that is intended to provide the PFMC's GMT with the best estimates of total commercial catch of certain species from certain PFMC managed areas. The QSM was first put into operation in 1985 and has been expanded to include additional species and ocean area combinations many times over the last eleven years. In addition, this QSM subsystem was re-built during the first half of 1995 in the Unix/Oracle environment of the Orca system. The main concept behind the QSM subsystem is that one can get a reasonably good estimate of total catch for the current year by combining hard data from the main database with soft data derived from state reported catches for recent weeks combined with some correction factor. For the QSM subsystem the best estimate of total catch is defined as:
Best Estimate of Total Catch = hard data + soft data hard data = catch summaries derived from state agency provided fish ticket data and/or rockfish species composition proportions derived from port sampling, and/or catch-by-area proportions derived from logbook data; these catch summaries are maintained in the summary-catch table soft data = weekly report catches multiplied by some correction factor weekly reported catches = catches reported by each state agency (W-O-C); each agency reports each week's catch within six days correction factor = hard data for the most recent 12-month period divided by the sum of reported catches for the same 12-month period The hard data used by QSM is determined by each agency's 90% data completion estimate. The most recent 12-month period used in the correction factor computation is also controlled by this data completion estimate date. The 90% completion estimate for each state agency is the month that 90% or more of the data has been input to the PacFIN system, with the implication that data for all earlier months are more complete. The following graphic explains these relationships:
On 6-28-96 the situation was:
Although "hard" data is referred to as "hard", it is also an estimate since it is based on data provided by the state agencies and is subject to change (usually additions). We don't expect much change in the hard data totals over 12 months. The change/additions usually occur in the most recent months. 6.4.6 Build Vessel Summaries Subsystem The build vessel summaries (bld_vsums) subsystem was originally developed at the request of West Coast economists. The primary users continue to be economists, but researchers other than economists have made use of this summarized data. Although this summarized data is similar in concept to the data residing in the summary-catch tables it is developed in an entirely different manner. The vessel summaries are built in a "batch" fashion, meaning that all six vessel summary files for each year are built with a single "pass" through the ftl and ft tables. These vessel summaries reside in Unix ASCII files rather than Oracle tables since most users want to transfer entire sets of weekly, monthly, or annual vessel summaries to their local computer systems. The bld_vsums program produces two kinds of summary files: a vessel-summary file and a vessel-trip-principal file. The vessel summary file consists of 16 items: year, vessel-id, time-period, pcid, spid, grid, arid, agid, processor-id, grade, condition, disposition, participant-group, landed-weight, round-weight, and revenue. The first 13 items serve as the unique key for the table and values for the last three items are computed for each combination of the 13 key items. The vessel-trip-principal file consists of 14 items: year, vessel-id, time-period, principal-port, principal-species, principal-gear, agency, principal-processor, first-day-fishing, last-day-fishing, number-of-trips, total days-fished, total round-weight, and total revenue. For vessel-trip-principal records the first eight items form the unique key and for each unique key the bld_vsums program determines the values of each of the last four items. The process of building vessel summaries (vsums) in 13 dimensions creates rather large files. As an example the 1988 bld_vsums process produced 456,661 weekly vsums, 277,631 monthly vsums, and 170,287 annual vsums for a total of 904,579 vessel summary records. In addition during the same process 165,294 weekly, 63,465 monthly, and 15,427 annual vessel-trip-principal (vtp) summary records were produced for a total of 244,186 vtp summary records. As of this writing, the program that builds these vessel summaries in the Unix/Oracle environment is still in the process of being developed. While this subsystem remains uncompleted, some of the vessel summary statistics are being produced using SQL*Plus script routines. Annual, monthly, and weekly vessel summaries are available for 1981 through 1996 and annual, monthly, and weekly vessel-trip-principal summaries are available for 1981 through 1994. Data completeness for each PacFIN data source is determined using a variety of indicators. The data captured about each data feed and saved in the dl table is one of the sets of data used in this exercise. The data saved in the dl table includes the total groundfish pounds added (or deleted) for each month by transactions included in each data feed. Another method used to help determine data completeness is the historical comparison of similar statistics from the summary-catch table. An example of this kind of historical comparison is selecting the total reported catch for dover sole for the month of May for 1991 through 1996 for all gears and all PFMC areas combined for the W-O-C agencies. Reviewing the table included below one can deduce that ODFW's data is probably at least 90% complete for the month of May 1996, while both CDFG's data and WDFW's data appear to be less than 90% complete for the month of May 1996. From PacFIN database (summary-catch) as of 6-28-96
From PacFIN database (summary-catch) as of 6-28-96
Dover sole is one of the best spid codes to use for this type of data completeness indicator. The PacFIN database in its entirety is a confidential database. It is confidential in that the economic history of individual fishing vessels and fish processors can be determined from the contents of the ft and ftl tables. Access to the PacFIN confidential data follows the rules set by the NMFS in NOAA administrative orders. The gist of these rules basically says that only statistics that do not reveal the economic activity of individuals or corporations can be made public. In order to adhere to these NOAA rules those individuals who are given access to the confidential part of the PacFIN database are required to sign a "Certificate of Non-disclosure of Confidential Fisheries Data". Only individuals who have a demonstrated need for access to confidential data are considered. The primary criteria for demonstrated need is that the individual must be participating in council activities that require the confidential data. Only employees of the NMFS, and other PCFDC member agencies are considered for on-line access. Certain individuals, who would be classified as independent consultants, after having signed the same "Certificate of Non-disclosure of Confidential Fisheries Data", are given electronic copies of confidential data specific to the PFMC study project they have contracted to complete. These independent consultants are required to destroy the confidential data once their study project has concluded. |
||
| Last Modified: Monday, 22-Apr-2002 16:35:09 PDT | ||