New York State Department of Social Services, DAB No. 1531 (1995)

Department of Health and Human Services

DEPARTMENTAL APPEALS BOARD

Appellate Division

SUBJECT: New York State Department of Social Services

DATE: August 3, 1995
Docket No. A-91-128
Decision No. 1531

DECISION

The New York State Department of Social Services (New
York) appealed a decision by the Administration for
Children and Families (ACF) disallowing $1,573,013 in
federal financial participation (FFP) claimed by New York
under Title IV-E of the Social Security Act.

The disallowance was based on a review of claims for FFP
in foster care maintenance payments made in two counties
during fiscal years (FYs) 1984 and 1985. ACF conducted
the review using statistical sampling and extrapolation.
ACF reviewed a sample of 232 foster care cases for FY
1984 and 234 cases for FY 1985. ACF determined that 42
cases in FY 1984 and 52 in FY 1985 were ineligible for
payments and projected these findings to New York's
entire claim to determine the amount of the disallowance.

New York moved that the disallowance be dismissed on the
grounds that ACF cannot demonstrate the scientific
validity of the sampling and extrapolation used to
calculate the disallowance. New York argued that ACF was
unable to meet its burden of demonstrating the scientific
validity of its statistical methodology because it had
failed to provide New York information it needed to
determine if the sample was random. New York argued that
it needed the order in which the numbers used to select
the sample cases were generated to be able to tell if the
numbers were random.

In support of its motion, New York initially requested
information regarding every aspect of the statistical
methodology employed by ACF, and the parties subsequently
exchanged extensive amounts of information and submitted
arguments to the Board along with numerous affidavits
from statistical experts.

On the basis of the following analysis, we conclude that
ACF has demonstrated the scientific validity of its
sampling methodology, and we deny New York's motion to
dismiss the disallowance. During a telephone conference
convened to discuss procedures for development of the
record in reference to New York's motion, ACF stated that
in the event the Board denied New York's motion, it would
be willing to enter into settlement discussions with New
York on individual case findings from the samples that
New York disputed. Accordingly, we remand this
disallowance to ACF so that it can set the procedures for
carrying out those discussions. If the parties are
unable to resolve their differences concerning those
findings, then ACF should notify the Board, and we will
re-docket the disallowance and advise the parties as to
necessary procedures for completing the record.

Background

The disallowance was based on a review of foster care
cases in two counties which received maintenance payments
during FYs 1984 and 1985 for which New York claimed FFP.
1/ The review consisted of examining samples of foster
care cases to determine if they were eligible for FFP.
The particular cases chosen for review were selected
using a computer software package called RATS-STATS. The
RATS-STATS package includes a random number generator
program, which the parties referred to as a pseudorandom
number generator, that is based on a mathematical
algorithm described in an article by Wichman and Hill in
the March 1987 issue of BYTE magazine, "Building a Random
Number Generator." Affidavits of Janet Fowler, Ph.D.,
September 3, 1993, 7-11, October 25, 1994, 4, 8.
The software was developed by the Office of Inspector
General (OIG) Office of Audit Services (OAS) to generate
lists of random numbers and has been widely used for
conducting audits on behalf of the Health Care Financing
Administration (HCFA) and other operating divisions of
the Department of Health and Human Services (HHS), and
the Social Security Administration (SSA), to assist
auditors in selecting random samples. The software
generates a list of numbers after being given a "seed
number" by the user. Unless the same seed number is
used, the software will produce a different sequence of
numbers each time the program is executed. If no seed
number is provided by the user, then the software selects
one automatically based on the time on the clock in the
computer.

To conduct this review, John Gaudiosi, the senior ACF
mathematical statistician, selected a sample size of 225
foster care cases for each of the two years being
reviewed, and used the software to generate two samples
of 200 numbers each, and two oversamples of 25 spare
numbers, to replace sampled cases discovered to be
unsuitable for review, such as state-funded cases listed
in error as Title IV-E cases. These numbers were then
used to select cases from the universe of Title IV-E
cases, which were listed in the order of the child's last
name. The ACF Regional office then requested the
statistician to provide 25 additional numbers for each
sample, which were also provided by the software. ACF
pulled 250 cases for FY 1984 and 249 for FY 1985, and
reviewed 232 cases for FY 1984 and 234 for FY 1985, after
discarding cases which were duplicate cases or cases
listed in error. Mr. Gaudiosi did not select the seed
numbers for the software to generate the lists of sample
numbers used to select the cases for review; instead, the
seed numbers were chosen by the software, based on the
time on the clock of the computer on which the software
was running. Affidavit of John Gaudiosi, September 7,
1993.

New York's arguments in support of dismissal

New York argued that the disallowance should be dismissed
because ACF cannot prove that the numbers used to select
the samples were random. New York argued that the Board
has held that statistical sampling must be valid, done in
accordance with scientifically accepted rules and
convention, and that a state can conduct an independent
assessment of the specific methodology used and its
application. Ruling in Docket No. 89-109, New York
Ex. 6; University of California -- General Purpose
Equipment, DAB No. 118 (1980); Ohio Dept. of Public
Welfare, DAB No. 226 (1981). New York argued that ACF
has the burden of proving the validity of the samples
taken in this disallowance.

New York argued that it cannot determine whether the
sample numbers were random without knowing either the
order in which they were generated, or the seed numbers
ACF used to generate the numbers, which it could then use
to regenerate the sample numbers in their original order.
It is undisputed that ACF did not retain the numbers in
their original order or the seed numbers. New York
asserted, therefore, that ACF failed to meet its burden
of explaining the statistical techniques employed to
select the sample cases to be reviewed.

New York argued that even if it could be conceded that
the software has been properly tested, as ACF alleged,
the software has not been shown to generate random
samples 100% of the time, since it passed those tests
with a 95% degree of reliability. New York argued that
it was because random number generators may occasionally
produce nonrandom-appearing lists of numbers that the
individual samples used had to be tested. New York
asserted that statistical authorities state that
individual samples should be tested before being used.
New York asserted that ACF is unable to provide New York
with the information needed to determine if the lists of
sample cases were random and representative.

New York also argued that, under quality control (QC)
procedures for the Aid to Families with Dependent
Children (AFDC) program, ACF requires that when a state
selects a sample of cases for review, it must provide ACF
with the list of selected sample cases and the computer-
generated random start and seed numbers. 45 C.F.R.
205.41(c). New York reasoned that based on this
precedent, it should be entitled to know the sample
numbers in the order generated by ACF.

Analysis

1. The RATS-STATS software performs reliably as a random
number generator.

While holding that statistical sampling can provide
reliable evidence of the amount of unallowable costs
claimed by a state, the Board has not provided specific
guidelines for the sampling procedures to be employed.
Rather, the Board has held only that the agency must
employ sound, scientifically valid statistical sampling
methodologies. New York State Dept. of Social Services,
DAB No. 1358 (1992). Where the methods employed are
scientifically valid, then the state has the burden of
producing evidence sufficiently strong to overcome the
presumptive reliability of the sampling evidence. New
York State Dept. of Social Services, DAB No. 522 (1984).
The Board has upheld the use of systematic sampling to
calculate disallowances, where the auditor first selects
a claim at random and then selects every nth claim
thereafter to achieve the desired sample size. Oklahoma
Dept. of Human Services, DAB No. 1436 (1993). The Board
held that this method for selecting a sample is
permissible in the absence of any showing that it
introduced a bias into the sample results. Id.

Courts considering statistical sampling and extrapolation
have also not created specific guidelines for sampling
methodologies. Chaves County Home Health Services, Inc.
v. Sullivan, 931 F.2d 914 (D.C.Cir. 1991), cert. denied,
502 U.S. 1091 (1992), cited by New York, held only that
sampling is appropriate if based upon a randomly selected
and statistically significant number of sample claims.
The court in Ratanasen v. California Dept. of Health
Services, 11 F.3d 1467 (9th Cir. 1993), conducted a
review of decisions from other circuits concerning the
use of statistical sampling and extrapolation to
determine overpayments and noted that those cases did not
specify that a certain method of sampling must be used to
satisfy due process. 931 F.2d at 1471, n.1. The court
approved the use of sampling and extrapolation as part of
audits in connection with Medicare and other similar
programs, provided the aggrieved party has an opportunity
to rebut the statistical evidence.

The evidence presented by ACF in contesting New York's
motion shows that the RATS-STATS software has passed a
standard battery of recognized statistical tests for
randomness and is thus entirely suitable for producing
lists of random numbers to select sample cases for
review. One of ACF's experts, the chief OIG/OAS
statistician, stated that the software was tested using
the 13 tests described by D.E. Knuth in The Art of
Computer Programming, Vol. 2, which the National Bureau
of Standards (NBS) has recognized for certifying
pseudorandom number generator software. She stated that
when the software is used to generate a simple random
sample, each sample of the same size has an equal
probability of being drawn from the universe. Affidavits
of Janet Fowler, Ph.D, September 3, 1993, 12-13, and
October 25, 1994, 4, 7. The software has been widely
used for conducting audits on behalf of HCFA, other
operating divisions of HHS, and SSA, has been made
available to other auditors and the public, and has been
distributed by the National Association of Government
Accountants. This expert further stated that she was
unaware of any problems encountered by users of the
random number generator software. Id., September 3,
1993, 7-11.

Moreover, Mr. Gaudiosi stated that the software had
successfully passed all the tests advocated by Knuth in
The Art of Computer Programming, Vol. 2, and that this
battery of tests demonstrated that there is a 95%
probability that each sample it produces would pass a
statistical test for randomness. Affidavits of John
Gaudiosi, July 15, 1994, 7, October 25, 1994, 13. He
disputed New York's assertion that 5% of the time any
sequence of numbers would thus not be random. He
explained that the test results mean that, at the 5%
level of significance, if sequences produced are
subjected to a test of randomness, 5% of the time the
null hypothesis of randomness would be rejected when it
is true, and that 95% of the time one would accept the
null hypothesis of randomness. Affidavit of John
Gaudiosi, October 25, 1994, 9. Accordingly, the
statements of Mr. Gaudiosi regarding the performance of
the RATS-STATS software do not support New York's
assertion that it cannot be relied on to randomly
generate sequences of numbers to select samples. ACF
also provided an affirmation from another statistician
who reviewed the NBS test results and concluded that the
RAT-STATS software is an excellent pseudorandom number
generator, with clear evidence of a random process.
Affirmation of John H. Kvanli, Ph.D., October 26, 1994,
12, 16.

ACF also presented evidence that randomness is primarily
a property of the process used to generate a list of
numbers, rather than of the list itself. Affidavit of
Janet Fowler, Ph.D., October 25, 1994, 4; Affirmation
of John H. Kvanli, Ph.D., October 26, 1994, 4-7;
Letter from Charles H. Alexander, Ph.D., December 1,
1993, enclosed with Affidavit of John Gaudiosi, July 15,
1994. This position received indirect support from New
York's expert, Dr. Heiner, who stated that to perform a
statistical financial audit, auditors review a "randomly
selected" subset of the transactions or cases that are at
issue. Affidavit of Karl Heiner, Ph.D., October 14,
1993.

Accordingly, based on the extensive affidavits and
statements of the three statisticians cited by ACF, we
conclude that the lists of numbers generated by the
software were randomly generated and were suitable to
select sample cases for review.

2. New York did not demonstrate that ACF's sampling
methodology was executed improperly.

New York's statistical expert questioned the validity of
the test results and stated that ACF had provided only
descriptions of the tests and not the actual results.
The parties exchanged extensive materials concerning the
tests of the software, and ACF's experts addressed the
concerns of New York's expert and identified what they
considered errors in his analysis. 2/ While New York
stated in each submission that it was fully adopting the
arguments in all prior submissions, in Dr. Heiner's last
affidavit, he did not take issue with ACF's position that
the software passed the 13 tests recommended by Knuth and
NBS.

Similarly, when New York initially noted apparent
discrepancies concerning which cases were selected for
review, these discrepancies were clarified by ACF's
statistician. For example, Mr. Gaudiosi explained that
the number of cases reviewed as reported in the
disallowance letter was less than the number of cases he
reported having selected for review because duplicate
cases and cases erroneously listed as Title IV-E cases
were discarded from the sample prior to conducting the
review.

Dr. Heiner also stated that when the RATS-STATS software
encounters a number identical to a previously selected
number, it will repeat the exact series of numbers once
again. He stated that the software is thus not suitable
as a random number generator because it does not afford
each sequence of numbers an equal chance of being
selected. Affidavit of Karl Heiner, Ph.D., December 16,
1994, 5. However, Dr. Heiner did not address the
earlier statement provided by Dr. Kvanli that if the
RATS-STATS generator were to generate 1,000 numbers every
second, the numbers would not begin to repeat for 220
years. Affirmation of John H. Kvanli, Ph.D., October 26,
1994, 13. Accordingly, the concerns raised by Dr.
Heiner are not sufficient to rebut ACF's evidence of the
reliability of the software.

3. New York did not establish that samples produced by
tested software are not appropriate for use.

New York also argued that individual samples should be
tested before being used. However, the authorities New
York cited, while indicating that such testing might be
advisable, did not contradict or rebut ACF's basic
argument that sample lists produced by certified, tested
software may be used to select samples without further
testing. New York cited Knuth, who discusses testing
sequences of numbers to see if they behave randomly, and
states that every sequence that is to be used extensively
should be tested carefully. D.E. Knuth, The Art of
Computer Programming, Vol. 2, at 38, Ex. 1 to Affidavit
of Janet Fowler, Ph.D., October 25, 1994. ACF argued
that "sequence" as used here by Knuth means a very large
group of numbers that the software produces and from
which lists of sample numbers are selected, rather than
an individual list of numbers. New York responded by
pointing out that ACF in its submissions has used
"sequence" and "sample" interchangeably, but did not
directly refute ACF's interpretation of Knuth's use of
the word. The fact that Knuth's recommendation applies
to sequences which are to be used "extensively" lends
weight to ACF's interpretation. Additionally, the cited
language does not contradict ACF's argument that tested
software may be relied on to generate samples for review.

We are also not persuaded by New York's argument and its
interpretation of Knuth because the other authorities it
submitted do not directly support its position that an
individual list of numbers produced by software that has
been tested and found to be functioning properly as a
pseudorandom number generator must be tested for
randomness before use. For example, New York cited Byron
J.T. Morgan, Elements of Simulation, at 137, New York Ex.
22, which states that "random numbers should always be
tested with an application in mind." From the excerpt
that New York provided, it appears that this statement
makes the point that, in choosing what tests to use, one
should consider the context, i.e., the applications that
are to be made of the samples. The statement does not
clearly mean that individual samples must be tested
before use.

New York also cited a letter introduced by ACF from a
census bureau statistician which states that it is good
practice to keep track of the original order of selection
of the sample units, for verification and control
purposes. Letter from Charles H. Alexander, Ph.D.,
December 1, 1993, enclosed with Affidavit of John
Gaudiosi, July 15, 1994. However, the same letter also
supports ACF's position that to test for randomness, one
must examine the process, rather than a specific sequence
of interest, and that "randomness" describes a property
of the process which produces a sequence of numbers, and
not a property of a finite sequence of numbers. Id.
Accordingly, the authorities cited by New York do not
support its argument that each list of numbers produced
by software that has passed recognized tests for
reliability must be tested for randomness before use.

Also as noted, the Board and court cases do not create
specific guidelines for sampling methodology, and state
only that valid statistical sampling done in accordance
with scientifically accepted rules provides reliable
evidence of unallowable costs. We note that HCFA Ruling
86-1, as cited by New York and by the court in Chaves,
refers to "randomly selected" sample claims. That
standard was satisfied here. By using software that had
passed recognized tests, ACF assured that the sample was
randomly selected.

4. New York did not demonstrate any problem with the
samples used here.

Although New York appeared to concede that the RATS-STATS
software had passed a battery of tests that would
demonstrate its reliability in selecting samples, it
nevertheless focused on the very small percentage of
instances where the software might select a sample that
would not pass tests for randomness. New York argued
that since it is unable to perform tests on the samples
here to find out if they fall within that small
percentage, the samples are unsuitable for use in
estimating the disallowance. As we discussed at length
above, a sample that did not pass particular tests can
still be considered randomly selected and still would be
appropriate for use here.

In any event, from New York's submissions, it appears
that there are tests that could have been performed on
the lists of numbers used to select the samples without
knowing the order in which they were generated. New
York's expert, Dr. Heiner, indicated that the tests that
would be performed on the lists of sample numbers are the
same tests that were used to test the RATS-STATS
software, that is, the tests described by Knuth and
referenced by Dr. Fowler. Affidavit of Karl Heiner,
Ph.D., August 23, 1994, E, F. Dr. Heiner stated that
not all of these tests require the numbers in the order
they were generated. 3/ However, New York did not
indicate that they performed any of the tests that do not
require the original order of the actual samples, and did
not argue that the samples had failed any of those tests.
Instead, New York maintained that they are unable to test
the samples without knowing the order in which the
numbers were generated. New York's failure to argue that
the samples failed tests which would not require the
order of generation reflects poorly on its case here,
given that New York declined to specify the number of
tests that an individual list of numbers must pass before
being deemed suitable for use. New York Submission,
December 16, 1994.

New York also argued that ACF had not presented
sufficient information for it to determine whether the
sample was representative, and argued that samples that
are not representative are biased and should not be used.
However, New York has the most familiarity with the
universe of cases in its foster care system and is thus
in the best position to determine whether the sample
cases selected for review are not representative of the
entire universe. In a previous appeal before the Board,
New York asserted that a sample was not representative
because (among other arguments) certain categories of
cases were overrepresented in the sample cases as
compared to the universe from which the sample was drawn,
and presented evidence in support of its claim. New York
State Dept. of Social Services, DAB No. 522 (1984). In
that case, the Board could not agree with either party
but found for the agency since a state has a burden to
produce evidence sufficiently strong to overcome the
presumptive reliability of the sampling evidence. The
Board also found defects in the state's evidence of non-
representativeness. Here, New York presented no evidence
to show that the sample was not representative of the
universe of foster care cases (for example, that the
payments in the sample cases were higher than the average
payments in the universe of cases), or that New York was
prejudiced by the selection in the samples of these
particular cases.

5. New York's arguments that the QC sampling procedures
support its position lack merit.

New York also argued that, under the QC procedures for
the AFDC program, ACF requires that when a state selects
a sample of cases for review, it must provide ACF with
the list of selected sample cases and the computer-
generated random start and seed numbers. 45 C.F.R.
205.41(c). New York reasoned that based on this
precedent, it should be entitled to receive from ACF the
sample numbers in the sequence drawn to calculate the
disallowance here.

We conclude, however, that the situation here is
distinguishable from the cited QC procedures in AFDC. In
the AFDC QC procedures identified by New York, ACF would
have to deal, potentially, with a wide range of sampling
methodologies, including pseudorandom number generators
used by individual states. It would be reasonable under
these circumstances that ACF should require this
information as a general rule so that it could replicate
the estimation process where necessary. Nevertheless,
the absence of any particular piece of information
required would not demonstrate that a state's estimate
was ipso facto invalid. We have already discussed why we
conclude that the estimates here are sufficiently
reliable even though ACF did not possess the seed numbers
or the original order of the samples drawn.

Conclusion

On the basis of the foregoing, we conclude that ACF has
demonstrated the scientific validity of its sampling
methodology, and we deny New York's motion to dismiss the
disallowance. We remand the disallowance to ACF so the
parties can discuss the findings in the sample cases. If
the parties are unable to resolve their differences
concerning those findings, then ACF should notify the Board, and we will re-docket the disallowance and advise
the parties as to necessary procedures for completing the
record.

Judith A. Ballard

Norval D. (John) Settle

Donald F. Garrett
Presiding Board Member

1. ACF appeared to use the terms payments, claims,
and cases interchangeably. ACF's disallowance letter
stated that the disallowance was based on a review of 232
foster care maintenance payments for FY 1984 and 234
payments for FY 1985. The report on the financial review
enclosed with the disallowance letter states that claims
were reviewed. Subsequently, ACF referred to the review
as having been conducted on foster care cases. Affidavit
of John Gaudiosi, September 7, 1993. Since the purpose
of the review was to determine if children were eligible
during the given time period, we refer in this decision
to the review having been conducted on cases.

2. For example, New York noted that three of the 13
tests could not have been performed because they require
a list of numbers in the order generated, whereas the
software lists the numbers in numerical order. In
response, ACF pointed out that the software provides an
option to list the order of selection next to each of the
numbers in a sample. ACF also disagreed with some of the
results of testing the software that Dr. Heiner obtained
on the basis that he employed incorrect degrees of
freedom when applying a chi-square test.

3. Dr. Heiner stated that "[a]t least three" of the
13 tests ACF says were performed require numbers in
sequential order. Affidavit of Karl Heiner, Ph.D.,
October 14, 1993, 20. He also stated that the order of
generation of sample numbers was needed to perform "most
of" the tests listed by Knuth. Affidavit of Karl Heiner,
Ph.D., August 23, 1994, E.